cshannon commented on issue #1327: URL: https://github.com/apache/accumulo/issues/1327#issuecomment-1476947406
Thanks for the feedback, I'll start exploring the JSON option and see how it goes. I may have some follow up questions on how to handle things as i start working on it but that's a good starting point. I was also thinking it would be nice to just write the entire DataFieldValue as as JSON object (so the 3 existing fields plus the new list of Ranges) but then it wouldn't be backwards compatible so you'd need some new way to detect the new format and code to handle the old and new ways of storing the data which adds complexity and could be tricky so not sure if that's the best thing to do or instead only serialize the Ranges as a JSON encoded String and appending as the 4th item in the comma separated representation of DataFieldValue. In terms of contiguous/overlapping ranges I plan to collapse them when storing to reduce the ranges. It makes sense to keep as few as possible as it should improve the performance as there will be less ranges to track/manage and reduces the data that is stored in metadata. There's already a very nice method called [mergeOverlapping](https://github.com/apache/accumulo/blob/540179d1f52dcc478eee3a3ee3c5fac106736c8b/core/src/main/java/org/apache/accumulo/core/data/Range.java#L418) in Range to handle this and I'm using it in my PR for the ranged File Reader when [constructing](https://github.com/apache/accumulo/blob/f4e5e66df9bca02d951a1801b5e9d459d815aea3/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java#L1590) the iterator for fencing. We may actually be able to remove that call from the ranged reader if we use it when storing in metadata as it would in theory be redundant as the stored ranges should already be merged together and not need to be checked again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
