Odd response format when using extractOnly option with Solr Cell
----------------------------------------------------------------

                 Key: SOLR-2217
                 URL: https://issues.apache.org/jira/browse/SOLR-2217
             Project: Solr
          Issue Type: Bug
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 1.4.1
         Environment: Ubuntu 10.4 LTS (Lucid), Java version "1.6.0_18" OpenJDK 
Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4ubuntu2) OpenJDK 64-Bit 
Server VM (build 16.0-b13, mixed mode), Tomcat 6
            Reporter: Donovan Jimenez
            Priority: Minor


When using the extractOnly request parameter, the 
oas.handler.extraction.ExtractingDocumentLoader is using stream.getName() for 
parts of the response, but this name appears to be null because the serialized 
response will return an unnamed string and a list named "null_metadata". It 
seems more appropriate to use "content" (producing a named string "content" and 
list "content_metadata") or to use whatever 
oas.handler.extraction.SolrContentHandler is using for the content field name 
(coded to "content", but mappable by request parameters).

201               rsp.add(*stream.getName()*, writer.toString());
202               writer.close();
203               String[] names = metadata.names();
204               NamedList metadataNL = new NamedList();
205               for (int i = 0; i < names.length; i++) {
206                 String[] vals = metadata.getValues(names[i]);
207                 metadataNL.add(names[i], vals);
208               }
209               rsp.add(*stream.getName()* + "_metadata", metadataNL);

This is mostly to avoid having to use the odd empty string and null_metadata 
identifiers in unserialized data (like JSON, PHP, RUBY, etc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to