[ https://issues.apache.org/jira/browse/LUCENE-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804299#action_12804299 ]
Paul Elschot commented on LUCENE-2232: -------------------------------------- I tried running with the patch applied on contrib/benchmark using the reuters data and sloppy phrase queries there. That did not really show any performance difference. Some further word counting revealed that the av. number of word in a reuters article is just below 129, which puts it in the category of small fields for which hardly any performance difference is expected for queries. Building the index was somewhat slower, as expected. Then I tried doing using sloppy phrase queries on the wiki data. But I ran into an "org.xml.sax.SAXParseException: Content is not allowed in prolog" while trying to build an index from enwiki.txt. I think I'm doing something completely wrong here, but I have no idea how to improve. Could anyone give me some tips on how to continue with this? Are there enough wiki articles of at least say 256 words in the 20070527 wiki dump? Is there a way to generate interesting sloppy phrase queries for the wiki data? > Use VShort to encode positions > ------------------------------ > > Key: LUCENE-2232 > URL: https://issues.apache.org/jira/browse/LUCENE-2232 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Paul Elschot > Attachments: LUCENE-2232-nonbackwards.patch > > > Improve decoding speed for typical case of two bytes for a delta position at > the cost of increasing the size of the proximity file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org