At first glance, I would recommend storing the metadata in their own
elements. You can always format it in xhtml if needed later. The advantage
is in faceting and search. If everything is in a <p> element, it will be
hard to make your search specific.

I might also suggest that your content and metadata be stored in one
document perhaps something like...

<imported-document>
  <content>
    [xml of document that is generated in CPF workflow]
  </content>
  <metadata>
    <source-file>my-file-imported-through-cpf.xlsx</source-file>
    <document-type>xlsx</document-type>
    <country>United States</country>
    <region>Western</region>
    <business>Pharmaceutical Sales</business>
    <source>...</source>
   ...
  </metadata>
</imported-document>

You can also separate the metadata from the content, but regardless, this
type of structure supports facets and filters to make your search more
powerful. If your search is ALWAYS going to be keyword-driven and only
against the content, then your approach is fine. If you want to be able to
develop faceted search, filtering by metadata, etc., then data structure
like I illustrated will better support that and make it very easy to do in
MarkLogic.

Hope this helps,
Harry




Harry Bakken
Avalon Consulting, LLC
[email protected]
801.792.6896



On Wed, Jun 12, 2013 at 8:00 AM, Rushabh M <[email protected]> wrote:

> Hello All,
>
> I am taking my baby steps in Marklogic and working on a POC where some
> thousands of documents(word/pdf/excel/ppt) need to be loaded into Marklogic
> with necessary meta data(like country, region, line of business,
> Market...). User will perform a text search on the content of document and
> the results should be display the List of documents along with the facets
> based on Meta data.
>
> My approach is to use Marklogic CPF for Word/pdf/ppt/excel to store as
> XHTML doc and the metadata to be stored as elements in the properties of
> XHTML doc.
> in XHTML the content is stored in <p> element. I am planning to perform a
> element search query on <p> and fetch the associated Metadata from
> Properties doc.
>
> Please suggest if you have a better approach and let me know how to do
> constraints on metadata elements to display facets.
>
>
> Thanks in Advance,
> Rushabh
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to