[
https://issues.apache.org/jira/browse/SOLR-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080415#comment-16080415
]
Andrew Lundgren commented on SOLR-10981:
----------------------------------------
Sorry for not getting back on this sooner. I spent quite a bit of time digging
into what the post would take.
In doing so I ended up refactoring and consolidating a fair amount of code
around the loaders. More so than I think probably belongs in a small patch
that I was hoping to get back on the 6.X branch.
I ended up working a fair amount on the StringStream loader trying to get it to
work with a compressed stream. That ended up as a dead end. (The String goes
into seemed to go double byte mode and corrupting the binary contents of the
gzip format.) If I followed it correctly the code that handles the post uses
that class. That code will need to get smarter so that it can use the header
to determine which input stream it needs to use.
I will continue working on this, but I think it best that it be on another
issue, as the amount of changed code is much larger including the refactoring.
What would it take to get the patch to handle the gzip URL/File on the 6.X
branch?
> Allow update to load gzip files
> --------------------------------
>
> Key: SOLR-10981
> URL: https://issues.apache.org/jira/browse/SOLR-10981
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrJ
> Affects Versions: 6.6
> Reporter: Andrew Lundgren
> Labels: patch
> Fix For: master (8.0), 7.1
>
> Attachments: SOLR-10981.patch, SOLR-10981.patch, SOLR-10981.patch
>
>
> We currently import large CSV files. We store them in gzip files as they
> compress at around 80%.
> To import them we must gunzip them and then import them. After that we no
> longer need the decompressed files.
> This patch allows directly opening either URL, or local files that are
> gzipped.
> For URLs, to determine if the file is gzipped, it will check the content
> encoding=="gzip" or if the file ends in ".gz"
> For files, if the file ends in ".gz" then it will assume the file is gzipped.
> I have tested the patch with 4.10.4, 6.6.0 and master from git.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]