Re: [MarkLogic Dev General] Marklogic database fields over large text

Abhishek53 S Tue, 14 Jun 2011 05:06:59 -0700

Hi Ryan
Thanks for your initiative ! I am aware of the problem . I already 
restructured the larger text and override the snippet formation logic for 
search:search API.


Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: abhishek5...@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
                        Business Solutions
                        Outsourcing
____________________________________________



From:
"seme...@hotmail.com" <seme...@hotmail.com>
To:
<general@developer.marklogic.com>
Date:
06/14/2011 05:25 PM
Subject:
Re: [MarkLogic Dev General] Marklogic database fields over large text
Sent by:
general-boun...@developer.marklogic.com



If you use search:search you may have a performance problem with 
highlighting. We've seen this happen with large text files where MarkLogic 
could find matching documents really fast but creating the highlight was 
really slow. This is because it takes time to find the the actual matching 
text within a large amount of text. We tried changing several indexing 
options but nothing really had any effect with the highlighting step. Our 
solution was to just add structure to any large text files. By converting 
them to xml and replacing line breaks with <p> the highighting was very 
faster.

I think that what happens is that MarkLogic indexes the exact values, the 
words, and word positions (depending on your DB settings), so queries run 
fast as long you don't ever have to actually retrieve the entire value of 
the large text element. For highlighting, the entire actual value is 
retrieve then parsed to find where in that value the match is and that can 
take some time. I would expect that if your XPath or cts:queries have to 
be filtered that you would have a performance problem there, too, because 
the filtering step will need to open the doc and confirm that the 
candidate document does indeed match the query which may involved parsing 
through that large text field. But with structure added, the element 
values to retrieve are much smaller (whatever s between the <p> tags) so 
highlighting and filtering is much faster.

So I would add structure for sentences perhaps and not have it be one 
large text value. But test it out and see what you find.

-Ryan

To: general@developer.marklogic.com
From: abhishek5...@tcs.com
Date: Tue, 14 Jun 2011 16:43:48 +0530
Subject: Re: [MarkLogic Dev General] Marklogic database fields over large 
text


Thanks Geert ! I also expect Field will share the indexing which is 
already there for database configuration. 
Regards
Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: abhishek5...@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.        IT Services
                       Business Solutions
                       Outsourcing
____________________________________________ 


From: 
Geert Josten <geert.jos...@daidalos.nl> 
To: 
General MarkLogic Developer Discussion <general@developer.marklogic.com> 
Date: 
06/14/2011 04:10 PM 
Subject: 
Re: [MarkLogic Dev General] Marklogic database fields over        large   
text 
Sent by: 
general-boun...@developer.marklogic.com




Hi Abhishek, 
  
Right, database fields. I overlooked that bit in my previous reply. But 
most still stands. The documentation (
http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/xml/admin/fields.xml&query=database+field
) states that field-query are similar to word-query, but constrained to 
specific elements. The word index is not an option, it is always there. I 
thought that element context is already available as well, so I would 
expect fields to reuse available index information for the most part. 
  
On the other hand, if your three elements already make up the major part 
of your documents, why use fields at all? Is it to explicitly exclude 
information in other elements? I don?t think using a field will give much 
performance improvement. 
  
As always however, the best way to know is to measure. Gajanan summarizes 
good alternatives. If performance is that important, then I recommend to 
simply compare the performance of all of them.. ;-) 
  
Kind regards, 
Geert 
  
Van: general-boun...@developer.marklogic.com [
mailto:general-boun...@developer.marklogic.com] Namens Abhishek53 S
Verzonden: dinsdag 14 juni 2011 12:14
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Marklogic database fields over 
large text 
  

Thanks Geert ! Somewhat you are in the right track to my questions. Over 
my database say some indexing options are already enabled. The same I need 
to redo over the field .Is it not duplicate index size increase? 

Let me correct if I am wrong. Why I am worry about the index size is 
because all the elements belonging to the database field are mazor portion 
of the total content. 

If I consider indexing mechanism , Once the document will ingested 
database level indexing as well as field level indexing will be form. 
Common settings have duplicate space consumption. 

Thanks & Regards
Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: abhishek5...@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.        IT Services
                      Business Solutions
                      Outsourcing
____________________________________________ 
From: 
Geert Josten <geert.jos...@daidalos.nl> 
To: 
General MarkLogic Developer Discussion <general@developer.marklogic.com> 
Date: 
06/14/2011 01:31 PM 
Subject: 
Re: [MarkLogic Dev General] Marklogic database fields over large text 
Sent by: 
general-boun...@developer.marklogic.com

  






Hi Abishek, 
 
As far as I know, when talking about the Word index, the index is what it 
says: an index of words. The element value is tokenized into unique words. 
The search term of word queries likewise. All individual words of the 
search term are searched for in the word index. After that, the results 
are filtered on other criteria. That way the length of the element value 
shouldn?t matter much. 
 
When talking about element value queries: that usually concerns an exact 
match. The length of the element value shouldn?t make much difference here 
either. I am not aware of value index length limitations.. 
 
Kind regards, 
Geert 
 
Van: general-boun...@developer.marklogic.com [
mailto:general-boun...@developer.marklogic.com] Namens Abhishek53 S
Verzonden: dinsdag 14 juni 2011 9:35
Aan: General MarkLogic Developer Discussion
Onderwerp: [MarkLogic Dev General] Marklogic database fields over large 
text 
 

Hi All 

My question is related to the database field. Sample content as follows 

<sample> 
      <title>EST</title> 
      <data> 
              <inner> 
                      <title>ABC.</title> 
                      <text1>Very large text.....Very large text.....Very 
large text.....Very large text.....Very large text.....</text1> 
              <inner> 
              <inner> 
                      <title>XYZ.</title> 
                      <text2>Very large text....Very large text.....Very 
large text.....Very large text.....Very large text......</text2> 
              <inner> 
              <inner> 
                      <title>RST</title> 
                      <text3>Very large text....Very large text.....Very 
large text.....Very large text.....Very large text......</text3> 
              <inner> 
      <data> 
</sample> 

I need to perform constrained search over the elements text1,text2, and 
text3 present under data element. 
If I will select database field approach  and select text1 2 & 3 are 
included elements into database field to write the constrained filed 
query,. Is this a good approach keeping performance in mind if text1 .2 & 
3 have very large text. 



Thanks in advance
Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: abhishek5...@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.        IT Services
                     Business Solutions
                     Outsourcing
____________________________________________ 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________ General mailing list 
General@developer.marklogic.com 
http://developer.marklogic.com/mailman/listinfo/general 
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Marklogic database fields over large text

Reply via email to