Hi Jason,


What you said make sense to me, and it does have much better performance. I'll 
follow your advice.



Thanks a lot,



Helen



________________________________
From: [email protected] 
[[email protected]] on behalf of Jason Hunter 
[[email protected]]
Sent: Tuesday, May 03, 2011 5:25 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] question about search

In 4.2 this is the best way.  You could do the call against <refauth> directly 
but you'll get "JamesWang" instead of "James Wang".

In my experience, deployments at scale work best if you develop a system where 
it's easy and automatic to reformat content to support new requirements.  
MarkLogic does a lot to make it possible to query your data as-is, but there's 
a limit to what can be done, and to get maximum performance you'll often want 
to tweak it.  People often do this by adding a <metadata> block at the top 
outside the <main> content.  Some have a "source" database and "compiled" 
database, where the source is the raw data you'll want humans to see, and it 
goes through a transformation step to the compiled database to optimize it for 
the application's deployment.

Imagine, for example, you want to sort articles by title.  If you want to 
ignore leading words "A", "An", and "The", then you probably want to create a 
"sortable-title" element or attribute with the leading word removed or placed 
at the end.  You don't want to dynamically remove those words at query time 
against millions of articles.

As another example, if you want to show all articles starting with "R" you 
might want an element or attribute start-letter="r" which makes this lookup a 
simple term list fetch and thus very lightweight.  You can do it without that 
attribute using a range index, but that's a bit less efficient.

Or let's say you want page counts or word counts on your articles.  That 
shouldn't go in the source that authors see, but it needs to be somewhere so 
the app can use it.  Put it in your transform step.

So my advice is treat the addition of a new element to support faster queries 
or extra features (like a word count) as not unusual, plan for it, and make it 
easy and automatic in your system.

-jh-

On May 3, 2011, at 1:56 PM, Helen Chen wrote:

Hi Jason,

Your understanding is correct. I tried it using some other data and it seems 
work fine.

The only problem here is that we have very large set of data, adding a new 
element means all the data has to be touched to build the new element for this 
search. It pretty much means that any time I want to do something like this, I 
have to change all the data to construct the new element to fit the search, 
this is based on the new requirement.  Do we have any function similar to 
cts:element-values()  but it works on field like cts:field-values()? What I'm 
trying to think is: the refauthor is simple element, it only has fname and 
surname, if I can create a field which combines the value of refauthor, then 
this field will serve as the same functionality as the new element 
<referencedauthor>.

Or any other work around?  But the bottom line is if this is the only way then 
I'll change all the data for it, I'll do it.

Thanks, Helen


From: Jason Hunter <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tue, 3 May 2011 13:25:24 -0700
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] question about search

You can use cts:frequency($author) to get the number of times the author was 
cited.  If the same person might be cited multiple times in an article and you 
want to count that, you'll want to specify "item-frequency" as an option to the 
cts:element-values() call.  The default "fragment-frequency" would count 
several citations in the same article as just one.

Hopefully I'm understanding what you want.

-jh-

On May 3, 2011, at 1:15 PM, Helen Chen wrote:

Hi Jason,

cts:element-values() will return me the unique list the referencedauthor, but 
it does not tell me which one shows up (or cited) most.  The basic idea is: 
based on how many times of each refauthor showed up, and the search returns the 
top 1 or top 5 refauthor.

Does field help this?

Thanks,
Helen



From: Jason Hunter <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tue, 3 May 2011 13:08:58 -0700
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] question about search

Hi Helen,

Add an element like <referencedauthor>James Wang</referencedauthor> in the 
document, perhaps in a new metadata block up top.  Put a range index on the 
chosen QName of type xs:string.  Then use cts:element-values() to extract the 
referenced authors.  You can pass a cts:query call to the function if you want 
to limit to just articles matching a query.  This approach will be fast at 
scale.  With the content shaped like you have right now, there's not an 
optimized way to do this at scale.

-jh-

On May 3, 2011, at 12:13 PM, Helen Chen wrote:

Hello there,

We have article xml in marklogic, inside each article, it lists the references 
that this article cited. I want to do a search to find out that inside 
/article/back/reference/citation/ref/jcite, which author is referenced most, or 
I can get a list of top 5  refauth who shows up in the reference section  most 
in article.

The article structure like the following:
<article>
    <front>…</front>
    <back>
        <references>
            <citation id="c1">
                <ref>
                    <jcite>
                        <refauth>
                            <fname>James</fname>
                            <surname>Wang</surname>
                        </refauth>
                        <jtitle>article title</jtitle>
                        <coden>AAA</coden>
                        <issn>1111</issn>
                        <volume>1</volume>
                        <pages>90</pages>
                        <date>2007</date>
                    </jcite>
                </ref>
            </citation>
        </references>
        <references>
            <citation id="c2">
                <ref>
                    <jcite>
                        <refauth>
                            <fname>Tom</fname>
                            <surname>Ding</surname>
                        </refauth>
                        <jtitle>my article title</jtitle>
                        <coden>AAB</coden>
                        <issn>1112</issn>
                        <volume>1</volume>
                        <pages>20</pages>
                        <date>2008</date>
                    </jcite>
                </ref>
            </citation>
        </references>
    </back>
</article>


Can anyone give me a suggestion how to do it? Or how to start ?

Thanks, helen
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________ General mailing list 
[email protected]<mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________ General mailing list 
[email protected]<mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to