[ 
https://issues.apache.org/jira/browse/SOLR-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100004#comment-14100004
 ] 

Uwe Schindler commented on SOLR-3213:
-------------------------------------

Hi,
the whole thing is not easy possible. The older commpns-csv as bundled with 
Solr is totally different than what was released as 1.0. This could 
theoretically be fixed API-wise, but there is a nother problem:
For "mv" fields (MultiValued), the whole underlying CSV parser is misused 
(parser inside parser). This cannot be done with common-csv, because the new 
common-csv-API is completely based on Iterable and invoking a new parser is 
quite heavy. Also its not easily possible to switch between CSVPrinters for one 
output Writer: You would need to do the same like for parsing: Create a new 
CSVWriter instance per multi-value that writes to StringWriter and then feed 
that StringWriter's value down to the main CSVPrinter.
This all is only possible in a performant way with a complete rewrite of CSV 
components in Solr.

I am not sure how to handle this:
- Keep the current CSV parser/printer? If we do this, we should maybe include 
it completely into Solr's package structure and not use "internal" als package 
name. This would just be a simple Eclipse-rename. We should then add a 
NOTICE.txt entry that refers to common-csv and that we have a forked, older 
version of this component that was modified for performance.
- In the future use common-csv, but completely rewrite the CSV handlers in 
Solr? This is especially hard for Multivalued fields, but the new API of 
commons-csv looks much better tahn the old one, although its more restricted 
(formats are stateless, parsers just implement Iterable, but don't allow to 
look into internals), so it is not easy to do the crazy record-in-value stuff 
for MultiFields.

> Upgrade to commons-csv once it is released
> ------------------------------------------
>
>                 Key: SOLR-3213
>                 URL: https://issues.apache.org/jira/browse/SOLR-3213
>             Project: Solr
>          Issue Type: Task
>          Components: Build
>            Reporter: Uwe Schindler
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-3213.patch
>
>
> Since SOLR-3204 we have a jarjar'ed apache-solr-commons-csv-SNAPSHOT.jar file 
> in lib folder. Once version 1.0 of commons-csv is officially released, we 
> should upgrade that to this version, remove maven publishing and change the 
> import statements to the official package name in java files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to