Thank you Harry.

Rushabh Mehta
Tata Consultancy Services
Cell:- 9989116577
Mailto: [email protected]
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
                        Business Solutions
                        Consulting
____________________________________________

 
At first glance, I would recommend storing the metadata in their own elements. 
You can always format it in xhtml if needed later. The advantage is in faceting 
and search. If everything is in a <p> element, it will be hard to make your 
search specific.  I might also suggest that your content and metadata be stored 
in one document perhaps something like...  <imported-document>   <content>     
[xml of document that is generated in CPF workflow]   </content>   <metadata>   
  <source-file>my-file-imported-through-cpf.xlsx</source-file>     
<document-type>xlsx</document-type>     <country>United States</country>     
<region>Western</region>     <business>Pharmaceutical Sales</business>     
<source>...</source>    ...   </metadata> </imported-document>  You can also 
separate the metadata from the content, but regardless, this type of structure 
supports facets and filters to make your search more powerful. If your search 
is ALWAYS going to be keyword-driven and only against the content, then your 
approach is fine. If you want to be able to develop faceted search, filtering 
by metadata, etc., then data structure like I illustrated will better support 
that and make it very easy to do in MarkLogic.  Hope this helps, Harry


To: [email protected]
From: Rushabh M/HYD/TCS
Date: 06/12/2013 07:30PM
Subject: TExt search and facets on Metadata

 Hello All,

I am taking my baby steps in Marklogic and working on a POC where some 
thousands of documents(word/pdf/excel/ppt) need to be loaded into Marklogic 
with necessary meta data(like country, region, line of business, Market...). 
User will perform a text search on the content of document and the results 
should be display the List of documents along with the facets based on Meta 
data.

My approach is to use Marklogic CPF for Word/pdf/ppt/excel to store as XHTML 
doc and the metadata to be stored as elements in the properties of XHTML doc.
in XHTML the content is stored in <p> element. I am planning to perform a 
element search query on <p> and fetch the associated Metadata from Properties 
doc.

Please suggest if you have a better approach and let me know how to do 
constraints on metadata elements to display facets.


Thanks in Advance,
Rushabh
 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to