Faron,

That *is* a lot of range indexes. I have never created that many myself, and 
would avoid 1000+ range indexes, or at least test it out before using that 
approach. If you do need to keep that schema, you may look at using 
co-occurrence to show the tag/value combinations that are most common. 
Co-occurrence can be good at this kind of two-part facet.

Yours,
Damon

From: [email protected] 
[mailto:[email protected]] On Behalf Of Fullbright, Faron
Sent: Wednesday, April 25, 2012 6:09 PM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Difficulty modeling our data in MarkLogic

Thanks for the quick response Damon.

Unfortunately, due to the nature of our data, the number of potential names can 
be potentially rather large (on the order of 1000s across all clients).  We are 
dealing with a requirement that the set of names we allow be unbounded, and new 
names can appear in every client feed we process.  Additionally, the same name 
can be associated with multiple data types.

Assuming we can get around the multiple data type per tag issue (which may be 
doable) and assuming dynamically creating new range indexes would be reasonably 
fast, how many range indexes would MarkLogic reasonably be able to support?

Faron

From: [email protected] 
[mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Wednesday, April 25, 2012 4:32 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Difficulty modeling our data in MarkLogic

Faron,

Ideally, your data will use the "X[tensible" feature of XML to handle new tags. 
Your data structure is a kind of meta-xml where the XML generically describes 
XML:

<tagList>
                <dateTag name="approved">20110101</dateTag>
                <stringTag name="color">Gold</stringTag>

Might be more simply be represented as
               <tagList>
                              <approved>20110101</approved>
                              <color>ABC</color >

This approach could not work in a relational DB, because you would need new 
columns for every new key, but it's fine to add new XML elements in most 
contexts. You will need to add range indexes for each custom field, but they 
will only "reindex" documents that contain the fields in question.

There are other possible approaches, but clean, simple data modeling is ideal 
if you can manage it.

Yours,
Damon

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]]<mailto:[mailto:[email protected]]>
 On Behalf Of Fullbright, Faron
Sent: Wednesday, April 25, 2012 10:04 AM
To: '[email protected]'
Subject: [MarkLogic Dev General] Difficulty modeling our data in MarkLogic

We are evaluating the potential to use MarkLogic for indexing and storage of 
content and have come across a use case that doesn't seem to map well to the 
MarkLogic indexing model.

Just wanted to describe the data model we are using (or at least that section 
of it that applies to this case), and see if we're potentially overlooking 
something.

Our primary requirement for indexing revolves around custom tags that we allow 
clients to associate with objects.  These custom tags are name/value pairs, and 
the values can have various types (string, date, datetime, real, int, etc.).

We need to be able to support fast range queries (that account for data type), 
fast ordering, and fast aggregation of distinct values across these tags.  Each 
of these operations needs to consider the tag name and value and the value's 
type.

I believe this would be a nice fit for pre-defined Range Indexes in MarkLogic 
if we had a finite, predetermined set of tag names and could create distinct 
elements for each tag name and could predefine a Range Index for each.  But 
since the set of potential tag names is unlimited, and since one tag name could 
be potentially associated with values that have multiple types, we can't really 
predefine anything.

Based on the documentation we've seen, we might potentially be able to get the 
functionality that I describe above to work using xpath queries against the 
standard indexes that MarkLogic builds when importing an XML document, but our 
concern is that, in the absence of Range Indexes, we would lack scalability (we 
need fast performance across a large number of objects each of which would have 
a large number of tags).

Is there some way to work around this with Range Indexes?

An example fragment of data:

<item>
<tagList>
                <dateTag name="attrName1">20110101</dateTag>
                <stringTag name="attrName2">ABC</stringTag>
                <realTag name="attrName3">1.123</realTag>
                <stringTag name="attrName3">DEF</stringTag>
</tagList>
</item>

Note:  we would need dateTag values to have type date, stringTag values to have 
type string, and realTag values to have type real for purposes of filtering, 
sorting, etc.

Thanks,

Faron

________________________________
This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html

________________________________
This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to