imay commented on issue #1684: [Proposal] Bitmap Index File Format for V2 
segment
URL: 
https://github.com/apache/incubator-doris/issues/1684#issuecomment-526075357
 
 
   > @imay thanks for the comments.
   > 
   > > If we support other type of index, for example B+Tree, do we need to 
store all index type in one index file?
   > 
   > No we don't. Different types of index will have different structure and 
metadata, I don't think it's a good idea write them into a single file.
   > 
   > > I think we should keep file immutable, so I don't agree with the second 
option to add new index. What about having multiple index files for one data 
file? If we add a new index, we can create a new index file for it. and we can 
merge all index files into one index file when we do compaction.
   > 
   > Agree. In this way the I/O cost of adding a new index is minimum, but 
reader now needs to know what indexes are available for this segment and the 
place of each index. Where do you think is the best place for such information? 
Perhaps in RowsetMetaPB?
   
   I think it's ok to save this information in RowsetMetaPB. RowsetMetaPB is 
seen as immutable, when we create a new index for rowset, we will create a new 
RowsetMetaPB for new rowset, and leverages link-schema-change to create hard 
link of old files. Then we can generate index for this rowset and save this 
information in new RowsetMetaPB.
   
   > 
   > > Now, we store dictionary in two levels, how much data size does this can 
support? Why not we abstract this to a B-tree which can support multiple layer. 
here is kudu's implementation which we can reference.
   > 
   > Since the dictionary is per segment, it won't be very large. So I think 
two-level is enough for most dataset. It's also a lot simpler and easier to 
implement than B-Tree. I think we should first start with the simpler approach 
and switch to the more complicated approach only when it's proven to be really 
necessary. Thanks for the link, I'll take a look.
   
   If we can make it easy to change, I think it's OK.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to