Thanks for the reply Jason. The reason I asked that question is  because
of the following email trail we had with Colleen Whitney at Marklogic.
If you want I could send some sample asset nodes from our assets.xml
document.



Hi
I am carrying on with this discussion which one of my colleague was
having with you. As you suggested I have one document for each assets.
The issue is we have around 40,000 assets now and those results in
having 40000 documents in our webdav which belongs to collection by name
"allassets", this makes it very much difficult for us to move around the
files or manage them when we have to delete file or replace them. Having
said that, i still went with the approach you suggested. I created
individual xml file and each file belong to the collection allassets.
But when I execute the following query which returns the first 20 assets
in DETAILS stage takes about 3 seconds

let $asset-list :=
collection("allassets")/asset[string(stage)="DETAILS"]
      [asset-categorization-node/super-category/category/@id = "2"][1 to
20] return $asset-list

each individual asset file looks like this the parent node is asset. 
<asset id="3000009">
    <file-name></file-name>
</asset>

where as If I use the document to retrieve the record it takes around
not even a second may be 1/10th of a second its very fast.

let $asset-list :=
doc("/assets.xml")/assets/asset[string(stage)="DETAILS"]
      [asset-categorization-node/super-category/category/@id = "2"][1 to
20] return $asset-list

where the assets.xml document looks like 
<assets>
    <asset id = "3333333">
    </asset>
    <asset id = "3111111">
    </asset>
    <asset id = "3322222">
    </asset>
    .
    . 
    Up till 40,000 assets.

</assets>

Could you please elaborate on this?

The other question which I have is we have been using assets.xml file
which has around 40,000 asset nodes. We have declared fragment roots for
assets, since we have common element name asset and hence makes the
structure fragmented. So why is that its loads the entire document in
memory or locks up the entire document when there is an update or insert
in a specific portion of a document.


Any advice?
Thanks
Rashid










-----Original Message-----
From: Colleen Whitney [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 17, 2007 11:54 AM
To: Engelke, John
Cc: [EMAIL PROTECTED]
Subject: RE: Question on XQuery

John, 

Thanks for your patience.  We'd like to address three things here:  1) a
concern about your data model that will impact scalability, 2) some
recommendations for tuning cache settings to reduce memory consumption,
as promised, and 3) a reminder about swap space.

1.  Data model:

Taking a step back and looking at your sample queries, we'd like to
raise one concern regarding your data model that will go to the question
of scalability.  It is particularly important with regard to memory
management.

Although we can go fairly far down the road by configuring swap space
and cache settings to utilize system resources as efficiently as
possible, those resources will stretch only so far on your 32-bit
system.

The central issue here is the fact that you've got all of your assets
stored in one very large (and soon to be much larger) document.  Even
though it is fragmented, if you are doing queries that require loading
all (or major portions of) the document's fragments into memory, you
will find yourself continually battling memory issues as the file grows.


A different approach, and one which would be much more efficient, would
be to store each asset in a different document.  You could then use
collections or directories to group assets for easy batch selection. In
addition, because locking is done at the document level, this is also a
better approach if you're doing frequent updates or insertions.  You'll
be able to lock at the level of <asset> rather than <assets>, avoiding
what can be a bottleneck.  To use an analogy from the relational
database world, the difference here is between having an ASSETS table
with table-level locking (in the single-document scenario), vs. one with
row-level locking (in the multiple-document scenario).

2.  Cache settings:

As I mentioned last week, Mark Logic recommends installing MarkLogic
Server on a dedicated system. At install time, MarkLogic Server
automatically sizes various parameters with the assumption that all of
the system's hardware resources are dedicated to it. If you are going to
share a single server between multiple software applications, you may
need to adjust parameters in the server configuration. 

Database > configure: Each database has a configuration page that allows
you to tune cache sizes for caches allocated dynamically during database
updates. Depending on the pattern of updates and on system load, more
than one instance of these caches may be allocated for a given database
during database updates.

We recommend that you try halving the in-memory limits for the
actively-ingesting database (ie, in-memory list size=64, in-memory tree
size=16, in-memory range index size=2). This will reduce the impact of
memory fragmentation, since you are more likely to find a contiguous
82-MB block of memory in your 32-bit address space.

You should restart the server after making this change, so that the
ingestion starts with a clean 3-GB address space.

In addition, if you have inactive project databases, they may occupy
significant address space, so it is worth drastically reducing the size
of these parameters for inactive databases, or even removing them if no
longer needed. 

3. Swap space:

Just to repeat the recommendation made last week, we highly recommend
increasing your swap space.  (See section 1.2 of our installation
guide...http://developer.marklogic.com/pubs/3.1/books/install.pdf).
Note that the problem here has to do with allocation of *contiguous*
blocks; so if address space is really chopped up you may run into this
error before running completely out of memory.

We hope that the tips on swap space and cache settings will get you past
the immediate issue; but we do encourage you to think carefully about
the impact of storing your assets in one large file.   

Colleen Whitney
Mark Logic Corporation
2000 Alameda de las Pulgas
Suite 100
San Mateo, CA 94403
+1 650 655 2366 Phone
+1 650 655 2310 Fax
[EMAIL PROTECTED]
www.marklogic.com
 
 
   Keynote address by Tim O'Reilly, Founder and CEO of O'Reilly Media
 
This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it
is addressed. Any review, disclosure, copying, distribution, or use of
this e-mail communication by others is strictly prohibited. If you are
not the intended recipient, please notify us immediately by returning
this message to the sender and delete all copies.  Thank you for your
cooperation.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason
Hunter
Sent: Sunday, May 13, 2007 12:27 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Stumped on comparing string
sequences

Hi Rashid,

I don't think you received a response, so let me jump in.

> As to my understanding, a collection is a bunch of documents. So 
> basically a document can belong to a collection, some thing like

Correct.

> Is there a possibility that I can define a filter on which nodes in a 
> document belong to a collection?

Membership in a collection is at the document level, not the node level.

> <persons>
> 
> <person>
> 
> <name>John</name>
> 
> <state>CA</state>
> 
> </person>
> 
> <person>
> 
>          <name>Jack</name>
> 
>          <state>AZ</state>
> 
> </person>
> 
> </persons>
>
> Can I say all persons belonging to California should be a part of 
> http://capersons.com <http://capersons.com/> collection and if yes how

> can I do that

You can do this without using collections.  For example:

/persons/person[state = "CA"][name = "John"]

Have you tried this?  How many person elements are you querying against 
in each file?

-jh-

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to