Hi,
Please find below the sample xml. I am sorry for the long question.
The file name is TR1078523.xml and the structure is below :
<Purchase>
<transaction-id>TR1078523-6f568ef97904 </transaction-id>
<transaction-type></transaction-type>
<product>
<title>Frame T-shirt</title>
<description>T-shirt with round neck with a rectangle inside a
rectangle inside a rectangle etc</description>
<product-id>PR1078523</product-id>
<product-item_group_id>tee21</product-item_group_id>
<product-condition>new</product-condition>
<product-availability>available for order</product-availability>
<product-price>20.00 AUD</product-price>
<product-brand>Blanc Ts</product-brand>
<product-size>L</product-size>
<product-image_link>http://ecommerce.com.ts/tee.png
</product-image_link>
<product-country>AUS</product-country>
<product-service>Standard</product-service>
<product-price>10.00 AUD</product-price>
</product-shipping>
</product>
</Purchase>
The above example is just from one file and there are many such files
roughly 500 GB in size. All files will have similar structure and some will
probably have more.
Probably there are more elements which I am omitting for various reasons.
These files reside in one directory lets assume.
Now my task is to store it into Marklogic.
1. Now I need to divide into two parts randomly of size part A 250 GB and
part B 250 GB(roughly) and store it into marklogic and the UI has options
to select from either A or B or both [checkboxes]
a) Now if A is checked, on the search page, files containing in Part A
will only be searched.Imagine if this file falls into Part A , then if the
user searches for transaction-id with 'TR1078523-6f568ef97904' then I need
to show this on the search page as there will be a hit. But if I select the
B checkbox , then it will show nothing since the above is not in part B.
And if I check both A and B [checkboxes ], I need to show this file and the
various fields.
How should I store these 500 GB into two parts . Collections or
Directories.And which tool should I use. I am thinking of using mlcp, but
what if I want to store these as different collections .
Next question:
2. Now if there are only two facets that I need to show on the UI page
'product-country' and 'product-condition' , then these two elements will
have range indexes.So, I can get the counts easily for these facets.
Now I want to query the count of product-size which is not part of facets
by 'L' [Large] in Part A only then what should be the query like.
Now my understanding of marklogic at this point is that only those elements
which need to be shown as facets need to have a range index and not others.
So since product-size is not in facets so I am not creating the range
indexes. Is my understanding correct?
My solution to get the counts of non-facets elements ,here is , as
product-size is not in facets but still I would be searching the
product-size by 'L' or 'M' or 'S' can I create a range index for
product-size so that I can get the counts easily. Or is it still possible
to include product-size as facets but while showing on the UI , I will show
only the product-country and product-condition and when the user queries
for product-size , I will still query for facets just to get the
product-condition counts. [ May be cheating here to get the counts] or is
there a way to get the counts of an element e.g product-size having 'L' as
value. [ How should be the query look like ]
3. My question is how to let Marklogic know that I want the <description>
element 's values which contain bunch of words to be made available for
word counting . Is it possible in Marklogic.
The user will search for a string say 'Large' , marklogic will give us say
10000 documents, and then from these 10000 documents I need to get the
<description> element's values containing those words and need to do a word
count.
e.g In <description> lets say 'neck' is appearing the most with 2000
counts likewise 'inside' with 100 and so on , I need to show
1. neck (2000)
2. 'inside(100) and so on.
It's long but some one can put me in the right direction.
Thanks
On Sat, Feb 14, 2015 at 1:20 AM, Michael Blakeley <[email protected]> wrote:
> Show us some XML. It's difficult to decipher what you mean without
> concrete examples.
>
> Don't rule out anything at this point. You may need a new range index. You
> may have to use XQuery.
>
> -- Mike
>
> > On 13 Feb 2015, at 10:50 , Maisnam Ns <[email protected]> wrote:
> >
> > Hi,
> >
> > Can someone help with this use case.
> >
> > I have a huge xml data in which product is one of the elements. I want
> to find the top 10 products from these data.
> >
> > Product is not in the range index and will not be part of facets.
> > How to search this with JAVA API and not with xquery.
> >
> > Secondly, I need to divide the data into two parts. In marklogic there
> are directories and collections.
> >
> > But how do I search a string from say part A if data is divided into
> part A and part B.There is an option to select just from part A , part B
> and both Part A and Part B. Depending on selection of options, if I select
> Part A , the string has to search from Part A likewise for Part B and if
> both A and B is selected it has to search from both A and B.
> >
> > Please let me know how to do this in java. A snippet of code will be
> highly appreciated.
> >
> > And , information studio of Marklogic does not provide any option for
> collections , it only provides for different directories.
> >
> > Thanks in advance
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general