kangpinghuang opened a new issue #1305: Add new file format for storage segment URL: https://github.com/apache/incubator-doris/issues/1305 **Is your feature request related to a problem? Please describe.** Now the segment format in BE storage is orc-like format. There are some problems: 1. the file header will be modified after we flush all data, It does not apply to cloud environment because the files in distribute file system(eg: hdfs), s3 and so do not support random write. 2. random seek. When read the stream, you first read the StreamHead(8 bytes) first, than read the stream data. I think this mechinism is not good. 3. there are no block cache. 4. string is stored in plain 5. it is hard to add secondary index 6. the data is store in static row number block .... So, I would like to add a new format segment for BE to solve the problems mentioned above. To goals to achive include: 1. write file meta to the footer of the segment file 2. to support block cache 3. to support secondary indexes, eg: bitmap index 4. to support dict encodeing string storage 5. construct a block in configured size 6. to support extend the encoding and compression easily.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
