[ 
https://issues.apache.org/jira/browse/COMPRESS-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781875#comment-13781875
 ] 

Thomas Meyer commented on COMPRESS-207:
---------------------------------------

Hi,

I need this for my offline wiki program: the wikipedia dump is an bzip2 
compressed xml file. On the first program run, my program scans through the 
bzip2/xml file and write two indexes:
1.) First index consists of the start position (in bits) of the current bzip2 
block and the corresponding uncompressed position in bytes
2.) Second index consists of page title and start of page in uncompressed 
position in bytes.

With the first index I can find the correct bzip2 block containing the searched 
page. By positioning the underlying input stream to the correct offset and 
skipping the given number of bits my program can uncompress the bzip2/xml file 
on the fly and skip to the correct page in the current bzip2 block.

> add notifier support for new block in BZip2CompressorInputStream
> ----------------------------------------------------------------
>
>                 Key: COMPRESS-207
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-207
>             Project: Commons Compress
>          Issue Type: New Feature
>          Components: Compressors
>    Affects Versions: 1.4.1
>            Reporter: Thomas Meyer
>            Priority: Minor
>              Labels: API, bzip
>         Attachments: BZip2CompressorInputStream-add-newBlock-notifier.patch, 
> BZip2CompressorInputStream-add-newBlock-notifier.patch, 
> BZip2CompressorInputStream-add-newBlock-notifier.patch
>
>
> hi,
> attached patch enables an program to add a listener when a new bzip2
> block is detected.
> The notifier is called with:
>  - xxx.newBlock(this, currBlockPosition)
> - this = the current BZip2CompressorInputStream object
> - currBlockPosition = The offset (i.e. start position) in the compressed
> input stream of the current block



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to