[
https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508833
]
Steven Parkes commented on LUCENE-848:
--------------------------------------
Trying to reproduce now.
Something that came up while restarting the fetch/decompress/etc. was the
number of files this procedure creates. It's a lot: one for each article. I
used the existing benchmark code for doing this stuff but perhaps it's not a
good idea on this scale? For one thing, it kinda kills ant since ant wants to
do a walk of subtrees for some of its tasks. Either we need to exclude the work
and temp directories from ant's walks and/or we should come up with something
better than one file per article.
I think Mike mentioned not doing the one file per article. I'll try to look at
that ...
> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> ------------------------------------------------------------------------
>
> Key: LUCENE-848
> URL: https://issues.apache.org/jira/browse/LUCENE-848
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/benchmark
> Reporter: Steven Parkes
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt,
> LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt,
> WikipediaHarvester.java, xerces.jar, xerces.jar, xml-apis.jar
>
>
> Add support for using Wikipedia for benchmarking.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]