Hi David, I don't think changing the segment ID to UUID is a good idea, it will cause usability issues.
1. Seeing a UUID named directory in the table structure would be weird, and not informative. 2. The show segments command would also have the same problem. Thanks Kunal Kapoor On Fri, Sep 4, 2020 at 8:38 AM David CaiQiang <david.c...@gmail.com> wrote: > [Background] > 1. In some scenes, two loading/compaction jobs maybe write data to the same > segment, it will result in some data confusion and impact some features > which will not work fine again. > 2. Loading/compaction/update/delete operations need to clean stale data > before execution. Cleaning stale data is a high-risk operation, if it has > some exception, it will delete valid data. If the system doesn't clean > stale > data, in some scenes, it will be added into a new merged index file and > can be queried. > 3. Loading/compaction takes a long time and lock will keep a long time also > in some scenes. > > [Motivation & Goal] > We should avoid data confusion and the risk of clean stale data. Maybe we > can use UUID as a segment id to avoid these troubles. Even if we can do > loading/compaction without the segment/compaction lock. > > [Modification] > 1. segment id > Using UUID as segment id instead of the unique numeric value. > > 2. segment layout > a) move segment data folder into the table folder > b) move carbonindexmerge file into Metadata/segments folder, > > tableFolder > UUID1 > |_xxx.carbondata > |_xxx.carobnindex > UUID2 > Metadata > |_segemnts > |_UUID1_timestamp1.segment (segment index summary) > |_UUID1_timestamp1.carbonindexmerge (segment index detail) > |_schema > |_tablestatus > LockFiles > > partitionTableFolder > partkey=value1 > |_xxx.carbondata > |_xxx.carobnindex > partkey=value2 > Metadata > |_segemnts > |_UUID1_timestamp1.segment (segment index summary) > |_partkey=value1 > |_UUID1_timestamp1.carbonindexmerge (segment index detail) > |_partkey=value2 > |_schema > |_tablestatus > LockFiles > > 3. segment management > Extracting segment interface, it can support open/close, read/write, and > segment level index pruning API. > The segment should support multiple data source types: file format(carbon, > parquet, orc...), HBase... > > 4. clean stale data > it will become an optional operation. > > > > ----- > Best Regards > David Cai > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >