[jira] Commented: (SOLR-1925) CSV Response Writer

Chris A. Mattmann (JIRA) Wed, 14 Jul 2010 19:01:49 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888670#action_12888670
 ]


Chris A. Mattmann commented on SOLR-1925:
-----------------------------------------

Hi Yonik:

Thanks. Replies below:

{quote}
    *  loses info by removing newlines
{quote}

Only does this when {noformat}&excel=true{noformat}, and actually adds 
functionality in doing so (without doing this, you can't load the data into 
Excel, see my comments above and in the code).

{quote}
    * always encapsulates with quotes - not as readable
{quote}

See the CSV spec, via Wikipedia in the links in the code. Doing so reduces 
ambiguity, and clearly delineates where the value starts, and where it stops.

{quote}
    * doesn't escape encapsulator in values
{quote}

Is there a need to do this? I don't think so...

{quote}
    * doesn't escape separator in multi-valued fields
{quote}

Same as above: no need, really.

{quote}
    * isn't really nested CSV, so it's not compatible with the CSVLoader
{quote}

What do you mean not compatible with CSV loader?

{quote}
    * uses System.getProperty("line.separator")... we should avoid different 
behavior on different platforms
{quote}

Hmm, I've never been dinged before for writing platform independent code. 
That's what they put the property in there, so line.separator means the same 
thing, programming-construct wise, across platforms. So, I don't really get 
your ding here.

{quote}
    * doesn't stream documents (dumping your entire index will be one use case)
{quote}

I actually implemented both the streaming method (#writeDoc) and the aggregate 
method (#writeAllDocs). I set #isStreaming to false, because it makes for a 
clean CSV header writing, rather than hacky code in #writeDoc to take care of 
the (potential) non-uniformity. Additionally, I'm using this in production 
right now, on solr-1.5 branch with an index of over 1M documents, and the 
performance overhead for the write is quite fast.

{quote}
    * performance: patterns shouldn't be compiled per-doc
{quote}

This only matters when {noformat}excel=true{noformat}, and I think the 
performance hit isn't really an issue. If you feel strongly about it though we 
could always compile the pattern above the loop, and reuse it...

> CSV Response Writer
> -------------------
>
>                 Key: SOLR-1925
>                 URL: https://issues.apache.org/jira/browse/SOLR-1925
>             Project: Solr
>          Issue Type: New Feature
>          Components: Response Writers
>         Environment: indep. of env.
>            Reporter: Chris A. Mattmann
>            Assignee: Erik Hatcher
>             Fix For: Next
>
>         Attachments: SOLR-1925.Chheng.071410.patch.txt, 
> SOLR-1925.Mattmann.053010.patch.2.txt, SOLR-1925.Mattmann.053010.patch.3.txt, 
> SOLR-1925.Mattmann.053010.patch.txt, SOLR-1925.Mattmann.061110.patch.txt
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It 
> currently takes all the docs resultant from a query and then outputs their 
> metadata in simple CSV format. The use of a delimeter is configurable (by 
> default if there are multiple values for a particular field they are 
> separated with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1925) CSV Response Writer

Reply via email to