Hi Dileepa,

In order for ManifoldCF to index metadata, you need to set metadata field
values in the RepositoryDocument object, not send Solr JSON as the
document's content.  In fact from your example it looks like you want zero
content.

Please read the RepositoryDocument java doc to see how you set metadata.

Karl


On Mon, Aug 10, 2015 at 9:05 AM, Dileepa Jayakody <[email protected]>
wrote:

> Hi All,
>
> We have a requirement to extract some meta-data from content documents and
> index those meta-data as separate documents into a Solr index.
> I'm writing a transformation connector where I construct a new repository
> document adding the meta-data extracted by the connector and hand it over
> to mcf-solr-connector to index in Solr.
> Currently I face some difficulties with indexing these new documents in
> Solr properly using solr-connector.
>
> The new solr document should contain some atomic updates for certain
> fields. So in my connector I create a JSON to represent the Solr atomic
> update request and set is as the binaryStream of the repository
> document.The json string for the new solr document is as below;
>
> String jsonString = "[{"id":"http://dbpedia.org/resource/Africa
> ","label":"Africa","documents":{"add":"sample2.txt"}}]";
>
>
> Then, I add an id and set above jsonString as the binary input stream of
> the repo-document as follows;
>
> repoDoc.addField( "id", idString );
> InputStream inputStream = IOUtils.toInputStream( jsonString );
> repoDoc.setBinary(inputStream, jsonString.getBytes().length);
>
> The expected behavior is Solr connector sending the SolrInputDocument
> constructed from the inputStream I added to the repo-document from my
> connector. But instead it adds the JSON  string to the  'content' field of
> the solr-document and sends to Solr.
>
> When I monitored the HTTP request from manifold to Solr I see below;
>
> POST /solr/core1/update?wt=xml&version=2.2 HTTP/1.1
> <add>
>    <doc boost="1.0">
>       <field name="id">http://dbpedia.org/resource/Africa</field>
>       <field name="_root_">[{"id":"http://dbpedia.org/resource/Africa
> ","label":"Africa","documents":{"add":"sample2.txt"}}]</field>
>       <field name="lcf_metadata_id">http://dbpedia.org/resource/Africa
> </field>
>    </doc></add>0
>
> Please note that the 'content' field configured in manifoldcf is *_root_*.
>
> But the expected Solr update request from solr-connector should be as
> below;
> <add>
>    <doc boost="1.0">
>     <field name="id">http://dbpedia.org/resource/Africa</field>
>      <field name="label">Africa</field>
>       <field name="documents" update="add">sample2.txt</field>
>      <field name="lcf_metadata_id">http://dbpedia.org/resource/Africa
> </field>
>    </doc></add>0
>
>
> Can someone please give some advice on how to use solr atomic updates with
> manifoldcf solr-connector? Have I missed some configurations/arguments?
>
> Thanks,
> Dileepa
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Reply via email to