cshannon commented on issue #1327:
URL: https://github.com/apache/accumulo/issues/1327#issuecomment-1476947406

   Thanks for the feedback, I'll start exploring the JSON option and see how it 
goes. I may have some follow up questions on how to handle things as i start 
working on it but that's a good starting point. 
   
   I was also thinking it would be nice to just write the entire DataFieldValue 
as as JSON object (so the 3 existing fields plus the new list of Ranges) but 
then it wouldn't be backwards compatible so you'd need some new way to detect 
the new format and code to handle the old and new ways of storing the data 
which adds complexity and could be tricky so not sure if that's the best thing 
to do or instead only serialize the Ranges as a JSON encoded String and 
appending as the 4th item in the comma separated representation of 
DataFieldValue.
   
   In terms of contiguous/overlapping ranges I plan to collapse them when 
storing to reduce the ranges. It makes sense to keep as few as possible as it 
should improve the performance as there will be less ranges to track/manage and 
reduces the data that is stored in metadata. There's already a very nice method 
called 
[mergeOverlapping](https://github.com/apache/accumulo/blob/540179d1f52dcc478eee3a3ee3c5fac106736c8b/core/src/main/java/org/apache/accumulo/core/data/Range.java#L418)
 in Range to handle this and I'm using it in my PR for the ranged File Reader 
when 
[constructing](https://github.com/apache/accumulo/blob/f4e5e66df9bca02d951a1801b5e9d459d815aea3/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java#L1590)
 the iterator for fencing. We may actually be able to remove that call from the 
ranged reader if we use it when storing in metadata as it would in theory be 
redundant as the stored ranges should already be merged together and not need 
to be checked again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to