[ 
https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698363#action_12698363
 ] 

Shai Erera commented on LUCENE-1591:
------------------------------------

bq. Does this issue depend on LUCENE-1595?

No, the other way around. Well ... it's not an actual dependency, just that 
1595 will touch a lot of files, and I want to minimize the noise of working on 
two issues that touch the same files (1595 will touch all the files this one 
touches) simultaneously. It's just a matter of convenience ...

Besides, I don't see what else can be done as part of this issue. The 
performance is reasonable, the code is quite simple. The patch includes some 
more enhancements to those files that is unrelated to bzip per sei, but are 
still required.

BTW, I successfully executed indexLineFile.alg on the 20070527 one-line bz2 
file and the overall indexing process ended in 1h, which seems reasonable to me.

Regarding Apache Compress, I asked the same question, so it's not fair to 
return it with a question ;). I don't think we should decide that now. It can 
be changed in 1595 if we think Compress is the better approach. Personally I 
prefer the ant jar, even though I realize it's adding a large dependency for 
just 3-4 classes ...

> Enable bzip compression in benchmark
> ------------------------------------
>
>                 Key: LUCENE-1591
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1591
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: ant-1.7.1.jar, LUCENE-1591.patch, LUCENE-1591.patch
>
>
> bzip compression can aid the benchmark package by not requiring extracting 
> bzip files (such as enwiki) in order to index them. The plan is to add a 
> config parameter bzip.compression=true/false and in the relevant tasks either 
> decompress the input file or compress the output file using the bzip streams.
> It will add a dependency on ant.jar which contains two classes similar to 
> GZIPOutputStream and GZIPInputStream which compress/decompress files using 
> the bzip algorithm.
> bzip is known to be superior in its compression performance to the gzip 
> algorithm (~20% better compression), although it does the 
> compression/decompression a bit slower.
> I wil post a patch which adds this parameter and implement it in 
> LineDocMaker, EnwikiDocMaker and WriteLineDoc task. Maybe even add the 
> capability to DocMaker or some of the super classes, so it can be inherited 
> by all sub-classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to