Odd response format when using extractOnly option with Solr Cell
----------------------------------------------------------------
Key: SOLR-2217
URL: https://issues.apache.org/jira/browse/SOLR-2217
Project: Solr
Issue Type: Bug
Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 1.4.1
Environment: Ubuntu 10.4 LTS (Lucid), Java version "1.6.0_18" OpenJDK
Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4ubuntu2) OpenJDK 64-Bit
Server VM (build 16.0-b13, mixed mode), Tomcat 6
Reporter: Donovan Jimenez
Priority: Minor
When using the extractOnly request parameter, the
oas.handler.extraction.ExtractingDocumentLoader is using stream.getName() for
parts of the response, but this name appears to be null because the serialized
response will return an unnamed string and a list named "null_metadata". It
seems more appropriate to use "content" (producing a named string "content" and
list "content_metadata") or to use whatever
oas.handler.extraction.SolrContentHandler is using for the content field name
(coded to "content", but mappable by request parameters).
201 rsp.add(*stream.getName()*, writer.toString());
202 writer.close();
203 String[] names = metadata.names();
204 NamedList metadataNL = new NamedList();
205 for (int i = 0; i < names.length; i++) {
206 String[] vals = metadata.getValues(names[i]);
207 metadataNL.add(names[i], vals);
208 }
209 rsp.add(*stream.getName()* + "_metadata", metadataNL);
This is mostly to avoid having to use the odd empty string and null_metadata
identifiers in unserialized data (like JSON, PHP, RUBY, etc)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]