If you use search:search you may have a performance problem with highlighting. 
We've seen this happen with large text files where MarkLogic could find 
matching documents really fast but creating the highlight was really slow. This 
is because it takes time to find the the actual matching text within a large 
amount of text. We tried changing several indexing options but nothing really 
had any effect with the highlighting step. Our solution was to just add 
structure to any large text files. By converting them to xml and replacing line 
breaks with <p> the highighting was very faster.

I think that what happens is that MarkLogic indexes the exact values, the 
words, and word positions (depending on your DB settings), so queries run fast 
as long you don't ever have to actually retrieve the entire value of the large 
text element. For highlighting, the entire actual value is retrieve then parsed 
to find where in that value the match is and that can take some time. I would 
expect that if your XPath or cts:queries have to be filtered that you would 
have a performance problem there, too, because the filtering step will need to 
open the doc and confirm that the candidate document does indeed match the 
query which may involved parsing through that large text field. But with 
structure added, the element values to retrieve are much smaller (whatever s 
between the <p> tags) so highlighting and filtering is much faster.

So I would add structure for sentences perhaps and not have it be one large 
text value. But test it out and see what you find.

-Ryan

To: general@developer.marklogic.com
From: abhishek5...@tcs.com
Date: Tue, 14 Jun 2011 16:43:48 +0530
Subject: Re: [MarkLogic Dev General] Marklogic database fields  over    large   
text



Thanks Geert ! I also expect Field will
share the indexing which is already there for database configuration. 

Regards

Abhishek Srivastav

Systems Engineer

Tata Consultancy Services

Cell:- +91-9883389968

Mailto: abhishek5...@tcs.com

Website: http://www.tcs.com

____________________________________________

Experience certainty.        IT Services

                
       Business Solutions

                
       Outsourcing

____________________________________________








From:
Geert Josten <geert.jos...@daidalos.nl>

To:
General MarkLogic Developer Discussion
<general@developer.marklogic.com>

Date:
06/14/2011 04:10 PM

Subject:
Re: [MarkLogic Dev General] Marklogic
database fields over        large    
   text

Sent by:
general-boun...@developer.marklogic.com








Hi Abhishek,

 

Right, database fields. I
overlooked that bit in my previous reply. But most still stands. The 
documentation
(http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/xml/admin/fields.xml&query=database+field)
states that field-query are similar to word-query, but constrained to specific
elements. The word index is not an option, it is always there. I thought
that element context is already available as well, so I would expect fields
to reuse available index information for the most part.

 

On the other hand, if your
three elements already make up the major part of your documents, why use
fields at all? Is it to explicitly exclude information in other elements?
I don’t think using a field will give much performance improvement.

 

As always however, the best
way to know is to measure. Gajanan summarizes good alternatives. If performance
is that important, then I recommend to simply compare the performance of
all of them.. ;-)

 

Kind regards,

Geert

 

Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com]
Namens Abhishek53 S

Verzonden: dinsdag 14 juni 2011 12:14

Aan: General MarkLogic Developer Discussion

Onderwerp: Re: [MarkLogic Dev General] Marklogic database fields over
large text

 



Thanks Geert ! Somewhat you are in the right track to my questions. Over
my database say some indexing options are already enabled. The same I need
to redo over the field .Is it not duplicate index size increase?




 Let me correct if I am wrong. Why I am worry about the index size is because
all the elements belonging to the database field are mazor portion of the
total content. 



If I consider indexing mechanism , Once the document will ingested database
level indexing as well as field level indexing will be form. Common settings
have duplicate space consumption.




Thanks & Regards

Abhishek Srivastav

Systems Engineer

Tata Consultancy Services

Cell:- +91-9883389968

Mailto: abhishek5...@tcs.com

Website: http://www.tcs.com

____________________________________________

Experience certainty.        IT Services

                    
  Business Solutions

                    
  Outsourcing

____________________________________________






From:

Geert Josten <geert.jos...@daidalos.nl>


To:

General MarkLogic Developer Discussion <general@developer.marklogic.com>


Date:

06/14/2011 01:31 PM


Subject:

Re: [MarkLogic Dev General] Marklogic database
fields over large text 

Sent by:

general-boun...@developer.marklogic.com

 











Hi Abishek, 

  

As far as I know, when talking about the Word index, the index is what
it says: an index of words. The element value is tokenized into unique
words. The search term of word queries likewise. All individual words of
the search term are searched for in the word index. After that, the results
are filtered on other criteria. That way the length of the element value
shouldn’t matter much. 

  

When talking about element value queries: that usually concerns an exact
match. The length of the element value shouldn’t make much difference
here either. I am not aware of value index length limitations..


  

Kind regards, 

Geert 

  

Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com]
Namens Abhishek53 S

Verzonden: dinsdag 14 juni 2011 9:35

Aan: General MarkLogic Developer Discussion

Onderwerp: [MarkLogic Dev General] Marklogic database fields over large
text 

  



Hi All 



My question is related to the database field. Sample content as follows




<sample> 

       <title>EST</title>


       <data>


               <inner>


                    
  <title>ABC.</title>


                    
  <text1>Very large text.....Very large text.....Very large
text.....Very large text.....Very large text.....</text1>


               <inner>


               <inner>


                    
  <title>XYZ.</title>


                    
  <text2>Very large text....Very large text.....Very large text.....Very
large text.....Very large text......</text2>


               <inner>


               <inner>


                    
  <title>RST</title>


                    
  <text3>Very large text....Very large text.....Very large text.....Very
large text.....Very large text......</text3>


               <inner>


       <data>


</sample> 



I need to perform constrained search over the elements text1,text2, and
text3 present under data element. 

If I will select database field approach  and select text1 2 &
3 are included elements into database field to write the constrained filed
query,. Is this a good approach keeping performance in mind if text1 .2
& 3 have very large text.








Thanks in advance

Abhishek Srivastav

Systems Engineer

Tata Consultancy Services

Cell:- +91-9883389968

Mailto: abhishek5...@tcs.com

Website: http://www.tcs.com

____________________________________________

Experience certainty.        IT Services

                    
 Business Solutions

                    
 Outsourcing

____________________________________________


=====-----=====-----=====

Notice: The information contained in this e-mail

message and/or attachments to it may contain 

confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 

review, distribution, printing or copying of the 

information contained in this e-mail message 

and/or attachments to it are strictly prohibited. If 

you have received this communication in error, 

please notify us by reply e-mail or telephone and 

immediately and permanently delete the message 

and any attachments. Thank you

_______________________________________________

General mailing list

General@developer.marklogic.com

http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________

General mailing list

General@developer.marklogic.com

http://developer.marklogic.com/mailman/listinfo/general






_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general                         
                  
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to