Hi, Maisnam:

Value and tuple requests are supported over uri or collection lexicons, range 
indexes, field range indexes, and geospatial indexes.

Given that the document set is quite small, you could create a string range 
index over the description, execute a value request on the description, split 
the description into words on the client side, and calculate the per-document 
word counts on the client side.

For larger document sets, you can calculate the number of documents with each 
word.

Using the cts:element-words() function in server-side XQuery (or, in MarkLogic 
8, using the cts.elementWords() function in server-side JavaScript), you can 
use an element word lexicon to get a list of words used in an element.  The 
documentation shows a technique that can be adapted to count the number of 
documents with each word:

    http://docs.marklogic.com/guide/search-dev/lexicon#id_95439

The Java API provides an interface for installing and executing server-side 
XQuery or JavaScript:

    http://docs.marklogic.com/guide/java/resourceservices#id_27702


Hoping that helps,


Erik Hennum

________________________________
From: [email protected] 
[[email protected]] on behalf of Maisnam Ns 
[[email protected]]
Sent: Saturday, February 21, 2015 9:19 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] JAVA API Query formation

Hi Erik,

Is word count possible in marklogic? What to configure in the Admin UI 8000 , 
if this is possible. Can I enable word lexicon and get the frequencies?
Say I have this xml , 1000 such xml files. Now I want to do a word count on the 
description element.
<info>
  <company>ibm</company>
  <year>2001</year>
  <country>US</country>
<description> This is an example description and such example benefits the 
description. </description>
</info>

So, in the above xml , I want to get something like
example (2)
description (2)
This  (1) .....etc

Thanks



On Sat, Feb 21, 2015 at 9:13 PM, Maisnam Ns 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Erik

On Sat, Feb 21, 2015 at 8:32 PM, Erik Hennum 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Maisnam:

Your query options should defined a tuple in which the first column is a range 
index on the country and the second column is a range index on year.


<options xmlns="http://marklogic.com/appservices/search";>
    <tuples name="yearByCountry">
        <range type="xs:string" collation="http://marklogic.com/collation/";>
            <element ns="" name="country"/>
        </range>
        <range type="xs:gYear">
            <element ns="" name="year"/>
        </range>
    </tuples>
</options>


After writing the "yearByCountry" query options to the server, you can then use 
the options to request tuples from the range indexes:


QueryManager queryMgr = dbClient.newQueryManager();
TuplesHandle results =
    queryMgr.tuples(queryMgr.newValuesDefinition("yearByCountry"), new 
TuplesHandle());
Tuple[] tuples = results.getTuples();

You can then iterate over the tuples to get the counts on the frequency of 
co-occurrence of each country and year.

For more information about defining tuples:

http://docs.marklogic.com/guide/rest-dev/appendixb#id_90089
http://docs.marklogic.com/guide/rest-dev/search#id_24433

For more information about making a tuple request:

http://docs.marklogic.com/javadoc/client/com/marklogic/client/query/QueryManager.html#newValuesDefinition(java.lang.String)
http://docs.marklogic.com/javadoc/client/com/marklogic/client/query/QueryManager.html#tuples(com.marklogic.client.query.ValuesDefinition,%20T)
http://docs.marklogic.com/javadoc/client/com/marklogic/client/query/Tuple.html#getCount()


Hoping that helps,


Erik Hennum

________________________________
From: 
[email protected]<mailto:[email protected]>
 
[[email protected]<mailto:[email protected]>]
 on behalf of Maisnam Ns [[email protected]<mailto:[email protected]>]
Sent: Saturday, February 21, 2015 12:02 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] JAVA API Query formation

Hi Eric,

Given this scenario:
Let's say this is file 1 and there are 1000 such different files
<info>
  <company>ibm</company>
  <year>2001</year>
  <country>US</country>
</info>

How do I get the count of years by country ='US' by using Java api

2001 - (20)
2002- (5)
2009 -(0)  etc

Thanks



On Sat, Feb 21, 2015 at 1:27 AM, Maisnam Ns 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Eric for your help. Will try to use XMLStreamWriter.

On Fri, Feb 20, 2015 at 11:09 PM, Erik Hennum 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Maisnam:

To get uncorrelated frequencies for three elements, you'll need to make three 
separate requests, one for each element.

Just so you're aware, you can also request tuples for the three elements, but 
that request returns the frequencies for the co-occurrence of values in a 
document and not the individual frequencies for each element.

By the way, the query options builder has been deprecated for several releases 
and could go away in any future release.  You should instead use a DOM (such as 
JDOM or XOM) or XMLStreamWriter to generate the options XML.


Hoping that helps,


Erik Hennum

________________________________
From: 
[email protected]<mailto:[email protected]>
 
[[email protected]<mailto:[email protected]>]
 on behalf of Maisnam Ns [[email protected]<mailto:[email protected]>]
Sent: Friday, February 20, 2015 2:40 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] JAVA API Query formation

Hi ,

Can someone help me with the JAVA API query formation for the below sample

Let's say this is file 1 and there are 1000 such different files
<info>
  <company>ibm</company>
  <year>2001</year>
  <country>US</country>
</info>

I just want to get the country, year and the count.

US 2001  70
US 2014   13
JAPAN 2000 10

Something like the above, I am able to get the count of only one element not two

QueryOptionsHandle options = new QueryOptionsHandle().withValues(
            qob.values("product",
                    qob.range(
                        qob.elementRangeIndex(new QName("country"),
                            
qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))),
                    "frequency-order"));
The above query gives me

US 190
CH  123
IND  70


Thanks



_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general




_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to