RE: [MarkLogic Dev General] Difference between eNode and Data Node

Danny Sokolsky Tue, 24 Mar 2009 14:39:55 -0700

Hi Geert,

You are definitely barking up the right tree...


As your content grows, you add more forests.  As your number of forests
grow, you add more d-nodes to host those forests.  There are some
guidelines about forest sizes in the Scalability documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) -- see page
7.  The exec summary is that, as a rule-of thumb (there are always
exceptions), at 200GB or 32-million fragments (on a 64-bit system), you
should add another forest.

The system is designed to scale this way.  The system mixes the content
across the forests (although it also allows you to choose which forest
to put content).  This design allows it to scale to extremely large
systems.   

-Danny

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert
Josten
Sent: Tuesday, March 24, 2009 2:09 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node

Hi Danny,

Yes, it certainly makes things more clear! At least to me..

One additional question that is more or less in line with this topic:
how should one handle large amounts of data? Just putting all in one
forest doesn't make sense. Handling content in different databases isn't
always an option either and perhaps not very efficient when it is
necessary to be able to search over all content.

Is it sufficient to just create multiple forests and assign those to one
database? How is content divided over these forests? Or am I barking up
the wrong tree? I didn't find much documentation on this particular
field.

Kind regards,
Geert

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Danny Sokolsky
> Sent: dinsdag 24 maart 2009 21:44
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Difference between eNode 
> and Data Node
> 
> Yes, any e-node host in the cluster can access any database in the
> cluster.  Think about the d-nodes as hosting forests.  The database is
> an abstraction on top of some number of forests.  The e-nodes host
> interfaces into the databases, which are App Servers (HTTP, XDBC, and
> WebDAV).
> 
> As an example, consider a 3-node cluster (this is a bit simplified):
> 
> host1: e-node group, has an HTTP server on port 80 talking to database
> d1.  It has no forest assignments.
> 
> host2: d-node group, has forest f1, which is attached to database d1.
> 
> host3: d-node group, has forest Security, which is the forest for the
> security db for the cluster. 
> 
> In order to access content in database d1, you must come in 
> through the
> App Server on host1.  A request such as doc("/foo.xml") is processed
> (roughly) as follows:
> 
> * an http request to run the code doc("/foo.xml") is submitted to the
> HTTP server on host1 (for example, there might be an XQuery file names
> doc.xqy under the app server root with the code that is accessed by an
> http request:  http://host1/doc.xqy).
> * host1 (the evaluator node) gets the HTTP request, parses the XQuery.
> * this request requires forest data, so host1 asks the 
> forests for this
> database (only one in this example) for the data needed (the document
> /foo.xml in this example).  This communication happens over xdqp.
> * host2 responds by sending the forest data for /foo.xml back to host1
> (via xdqp).
> * host1 then processes the result and returns it back to the client.
> 
> As I said, this is a bit simplified.  For example, the host that is
> hosting the Security forest also gets involved, but its 
> involvement is a
> little more complex and is not really important for understanding how
> the e-node / d-node communication takes place.  Suffice it to say that
> every request is authenticated and a user can only see and do 
> that which
> he is authorized.  
> 
> There are other simplifications too.  But hopefully that gives you a
> better idea of how it works.
> 
> -Danny
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Geert
> Josten
> Sent: Tuesday, March 24, 2009 1:06 PM
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
> Node
> 
> Hi Danny,
> 
> How are Databases in the d-group accessed from the e-nodes in the
> e-group? Is it sufficient that both groups are defined in the same
> cluster? Databases are configured outside the Group 
> configuration. Is a
> d-group created by only assinging hosts connected to a forest, and
> assigning all other hosts to the e-group which contains the needed app
> servers?
> 
> It would be helpful if you could give a more explict example. Is that
> possible without writing a lenghty email?
> 
> Kind regards,
> Geert
> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of 
> > Danny Sokolsky
> > Sent: dinsdag 24 maart 2009 17:52
> > To: General Mark Logic Developer Discussion
> > Subject: RE: [MarkLogic Dev General] Difference between eNode 
> > and Data Node
> > 
> > Hi Saptarshi,
> > 
> >  
> > 
> > There is no requirement that an e-node must also have a 
> > forest attached to it.  In fact, in large implementations, 
> > the norm is to configure e-nodes to do only e-node work and 
> > d-nodes to do only d-node work.  That is what Groups are for. 
> >  You might, for example, set up 2 groups, one for e-nodes and 
> > one for d-nodes.  The d-node groups do not need to have any 
> > app servers on them, and the e-node groups do not need any 
> > databases or forests.  This means that each node can devote 
> > its entire life (and all of its resources) to its role.  For 
> > example, if you have a group that only has d-nodes, you do 
> > not need to allocate much expanded tree cache (that is used 
> > for e-node processing).  Similarly, if a group is only 
> > e-nodes, they do not need to allocate much list cache or 
> > compressed tree cache.   Be extra careful when changing these 
> > values, however, and make sure you know what role your hosts 
> > are playing.
> > 
> >  
> > 
> > Hosts in a MarkLogic Server cluster communicate via the xdqp 
> > protocol, which is an internal communication mechanism.  Any 
> > changes to the cluster are communicated to the other hosts 
> > via xdqp, and forest data is transferred to the e-node via 
> > xdqp.  All of this communication happens automatically.
> > 
> >  
> > 
> > Hope that helps,
> > 
> > -Danny
> > 
> >  
> > 
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of 
> > Saptarshi Newyork
> > Sent: Monday, March 23, 2009 10:08 PM
> > To: General Mark Logic Developer Discussion
> > Subject: RE: [MarkLogic Dev General] Difference between eNode 
> > and Data Node
> > 
> >  
> > 
> > Hi All,
> > 
> >  Thanks for a great description and examples. I still have 
> > couple of questions to add:
> > 
> >  
> > 
> > 1) I understand that same node can work as both eNode and 
> > dNode, bu if I want to have separate eNode and dNode, in that 
> > case, is there any difference in configuration of the host 
> > for these two nodes?
> > 
> >  
> > 
> > 2) In an architecture where both eNode and dNode exist, 
> > suppose a request comes to eNode which requires an access to 
> > forest. Then it's written that eNode will send the request to 
> > dNode to access the forest. But every evaluator node(eNode) 
> > is also attached to some forests. How this transfer of 
> > request is achieved? How eNode can make a call to dNode?  Is 
> > there any configuration or coding required to achieve this? 
> > Can under any scenario eNode access its own forest?
> > 
> >  
> > 
> > Thanks in advance.
> > 
> > regards,
> > 
> > Saptarshi
> > 
> > --- On Mon, 3/23/09, Danny Sokolsky <[email protected]> wrote:
> > 
> >     
> >     From: Danny Sokolsky <[email protected]>
> >     Subject: RE: [MarkLogic Dev General] Difference between 
> > eNode and Data Node
> >     To: "General Mark Logic Developer Discussion" 
> > <[email protected]>
> >     Date: Monday, March 23, 2009, 4:18 PM
> > 
> >     Hi Geert,
> >     
> >     Thanks for the great description.  I will just add one 
> > thing to what you
> >     said:
> >     
> >     Whether a host acts as an e-node or a d-node depends on 
> > what it is doing
> >     at the time, and a given host in a MarkLogic cluster 
> > can behave as an
> >     e-node, a d-node, or both.  For example, if you have a 
> > single host
> >     instance of MarkLogic Server, that host acts as both 
> > the e-node (to
> >     evaluate XQuery) and as the d-node (to perform forest 
> > operations on
> >     content).  
> >     
> >     -Danny
> >     
> >     -----Original Message-----
> >     From: [email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=general-bounces@
> > developer.marklogic.com> 
> >     [mailto:[email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=general-bounces@
> > developer.marklogic.com> ] On Behalf Of Geert
> >     Josten
> >     Sent: Monday, March 23, 2009 12:59 PM
> >     To: General Mark Logic Developer Discussion
> >     Subject: RE: [MarkLogic Dev General] Difference between 
> > eNode and Data
> >     Node
> >     
> >     Saptarshi,
> >     
> >     I am not an authority on this matter either, but I will 
> > try to explain
> >     as well as possible..
> >     
> >     1) MarkLogic Server is designed to operate with 
> > evaluator nodes and
> >     database nodes. The database nodes access content 
> > stored in forests and
> >     perform search queries over the forests. The evaluator nodes are
> >     responsible for executing the Xquery code, webdav 
> > requests, XDBC calls
> >     etc. If the involved code to be executed doesn't access 
> > any content
> >     stored in the database (no cts:search calls, no doc 
> > statements, etc),
> >     but purely relies on in memory constructed content, 
> > then database nodes
> >     are not accessed. It has nothing to do with caching of 
> > any kind, it is
> >     just that content can be constructed on the fly, by 
> > just incorporating
> >     it in the Xquery script for instance. The example Eric 
> > supplied is
> >     valid.
> >     
> >     2) MarkLogic Server does not handle failover when 
> > filesystems crash. The
> >     documentation
> >     
> > (http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
> >     that filesystem crashes should be handled by using a clustered
> >     filesystem. There are some suggestions in that 
> > document, but I can
> >     imagine that a RAID configuration might suffice for 
> > simples situations
> >     as well. Forest-level failover works as follows: you 
> > assign multiple
> >     hosts to one physically shared forest. These hosts are 
> > listed in order.
> >     If the 1st host drops out, the 2nd host takes that forest over.
> >     Replication of data is not necessary that way, making 
> > it more efficient
> >     and much more scalable. At the front-end you have also 
> > the HTTP servers
> >     etc on the hosts. You can have as many as you like. By putting a
> >     hardware or software load-balancer in front you can 
> > distribute calls
> >     coming in at a single port to all available 'evaluator' nodes.
> >     Load-balancing is not handled by MarkLogic Server 
> > itself, there are
> >     plenty solutions readily available so why bother. ;-)
> >     
> >     I am not sure whether an HTTP server is the actual 
> > evaluator node, but I
> >     don't think so. There is this Task Server configuration 
> > page within the
> >     MarkLogic Server Group Administration. This configures 
> > Task threads on
> >     all hosts within a single group. I have the impression 
> > these act as
> >     evaluator nodes and the Databases in the MarkLogic 
> > Server Administration
> >     correspond to the database nodes. Forest-level failover 
> > is configured at
> >     the Forest configuration pages.
> >     
> >     I hope this makes things clearer to you!
> >     
> >     Kind regards,
> >     Geert
> >     
> >     >
> >     
> >     
> >     Drs. G.P.H. Josten
> >     Consultant
> >     
> >     
> >     http://www.daidalos.nl/
> >     Daidalos BV
> >     Source of Innovation
> >     Hoekeindsehof 1-4
> >     2665 JZ Bleiswijk
> >     Tel.: +31 (0) 10 850 1200
> >     Fax: +31 (0) 10 850 1199
> >     http://www.daidalos.nl/
> >     KvK 27164984
> >     De informatie - verzonden in of met dit emailbericht - 
> > is afkomstig van
> >     Daidalos BV en is uitsluitend bestemd voor de 
> > geadresseerde. Indien u
> >     dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
> >     verwijderen. Aan dit bericht kunnen geen rechten worden 
> > ontleend.
> >     
> >     
> >     > From: [email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=general-bounces@
> > developer.marklogic.com> 
> >     > [mailto:[email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=general-bounces@
> > developer.marklogic.com> ] On Behalf Of
> >     > Saptarshi Newyork
> >     > Sent: maandag 23 maart 2009 12:30
> >     > To: [email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=gene...@develope
> > r.marklogic.com> 
> >     > Subject: [MarkLogic Dev General] Difference between eNode and
> >     > Data Node
> >     >
> >     > Hi ,
> >     > I have a few questions:
> >     >
> >     > 1)  What is the difference between eNode and dNode? I have
> >     > read that E-nodes are required to evaluate XQuery programs,
> >     > XCC/XDBC requests, WebDAV requests, and other server
> >     > requests.and dNodes are those which directly talks with the
> >     > database/forest. It is also told that if the request does not
> >     > need any forest data to complete, then an e-node request is
> >     > evaluated entirely on the e-node. I do not understand how
> >     > this is possible!! If eNode is meant for XQuery evaluation
> >     > and XQuery needs an XML to process, then every eNode request
> >     > should talk to dNode. Is there any caching mechanism? It will
> >     > be great if anybody can explain this to me?
> >     >
> >     >
> >     >
> >     > 2) There are two failover mechanism explained in the
> >     > documentation. Forest level failover and eNode level
> >     > failover. It seems that forest data level failover is not
> >     > handled by Marklogic. Like if the filesystem crashes, is
> >     > there anyway by which Marklogic server replicates the forest
> >     > to other hosts in same or different cluster? If this feature
> >     > is not presently supported, then when can we expect this on
> >     > the roadmap?
> >     >
> >     >
> >     >
> >     > Thanks in advance.
> >     >
> >     >
> >     >
> >     > regards,
> >     >
> >     > Saptarshi
> >     >
> >     >
> >     
> >     _______________________________________________
> >     General mailing list
> >     [email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=gene...@develope
> > r.marklogic.com> 
> >     http://xqzone.com/mailman/listinfo/general
> >     _______________________________________________
> >     General mailing list
> >     [email protected] 
> > <http://us.mc588.mail.yahoo.com/mc/compose?to=gene...@develope
> > r.marklogic.com> 
> >     http://xqzone.com/mailman/listinfo/general
> > 
> >  
> > 
> > _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Difference between eNode and Data Node

Reply via email to