[ https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592362#comment-13592362 ]
David Smiley commented on LUCENE-4752: -------------------------------------- I wonder what other big-data software is also sorting its data files both within-file *and* across them (the latter being the tricker part I think)? Cassandra, HBase, or Accumulo? The code details are going to be specific to the platform but I'm interested in the scheduling / merging algorithm, which seems like the biggest challenge to me. I bet this has been solved before. My initial attempts at coming up with an algorithm on my notepad isn't showing much promise. > Merge segments to sort them > --------------------------- > > Key: LUCENE-4752 > URL: https://issues.apache.org/jira/browse/LUCENE-4752 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index > Reporter: David Smiley > Assignee: Adrien Grand > > It would be awesome if Lucene could write the documents out in a segment > based on a configurable order. This of course applies to merging segments > to. The benefit is increased locality on disk of documents that are likely to > be accessed together. This often applies to documents near each other in > time, but also spatially. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org