[ https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888670#action_12888670 ]
Chris A. Mattmann commented on SOLR-1925: ----------------------------------------- Hi Yonik: Thanks. Replies below: {quote} * loses info by removing newlines {quote} Only does this when {noformat}&excel=true{noformat}, and actually adds functionality in doing so (without doing this, you can't load the data into Excel, see my comments above and in the code). {quote} * always encapsulates with quotes - not as readable {quote} See the CSV spec, via Wikipedia in the links in the code. Doing so reduces ambiguity, and clearly delineates where the value starts, and where it stops. {quote} * doesn't escape encapsulator in values {quote} Is there a need to do this? I don't think so... {quote} * doesn't escape separator in multi-valued fields {quote} Same as above: no need, really. {quote} * isn't really nested CSV, so it's not compatible with the CSVLoader {quote} What do you mean not compatible with CSV loader? {quote} * uses System.getProperty("line.separator")... we should avoid different behavior on different platforms {quote} Hmm, I've never been dinged before for writing platform independent code. That's what they put the property in there, so line.separator means the same thing, programming-construct wise, across platforms. So, I don't really get your ding here. {quote} * doesn't stream documents (dumping your entire index will be one use case) {quote} I actually implemented both the streaming method (#writeDoc) and the aggregate method (#writeAllDocs). I set #isStreaming to false, because it makes for a clean CSV header writing, rather than hacky code in #writeDoc to take care of the (potential) non-uniformity. Additionally, I'm using this in production right now, on solr-1.5 branch with an index of over 1M documents, and the performance overhead for the write is quite fast. {quote} * performance: patterns shouldn't be compiled per-doc {quote} This only matters when {noformat}excel=true{noformat}, and I think the performance hit isn't really an issue. If you feel strongly about it though we could always compile the pattern above the loop, and reuse it... > CSV Response Writer > ------------------- > > Key: SOLR-1925 > URL: https://issues.apache.org/jira/browse/SOLR-1925 > Project: Solr > Issue Type: New Feature > Components: Response Writers > Environment: indep. of env. > Reporter: Chris A. Mattmann > Assignee: Erik Hatcher > Fix For: Next > > Attachments: SOLR-1925.Chheng.071410.patch.txt, > SOLR-1925.Mattmann.053010.patch.2.txt, SOLR-1925.Mattmann.053010.patch.3.txt, > SOLR-1925.Mattmann.053010.patch.txt, SOLR-1925.Mattmann.061110.patch.txt > > > As part of some work I'm doing, I put together a CSV Response Writer. It > currently takes all the docs resultant from a query and then outputs their > metadata in simple CSV format. The use of a delimeter is configurable (by > default if there are multiple values for a particular field they are > separated with a | symbol). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org