Re: [MarkLogic Dev General] Update: replace vs insert

Geert Josten Thu, 21 Nov 2013 02:38:33 -0800

Hi Guoliang,

As Mike said earlier, there is not much difference in technical sense. It
is mostly a question what is more convenient to you as developer. If you
are only inserting or replacing a few nodes in a large XML, it is easier to
use the node functions, instead of replicating the entire XML using
recursive functions. MarkLogic server will keep track of all node
manipulations, and more or less apply the changes in memory under the hood,
and insert the changed fragment into the database at the end of a
request/transaction. But sometimes, if you make large changes to a
document, it is easier for a developer to recreate the entire document, and
insert that as one piece into the database, which is also committed at the
end of a request/transaction..

Kind regards,

Geert

*Van:* [email protected] [mailto:
[email protected]] *Namens *Guoliang Li
*Verzonden:* donderdag 21 november 2013 10:23
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Update: replace vs insert

Thanks  a  lot:)

I  read some  document  today, however I still  not  very  clear  about the
difference  between replace  and  insert.

We're  building  an  application  which will  track  all  historical
data.  And  based  on  my  known, both  replace-node  and  insert  can
update  the  existing  data.

For  replace node,  MarkLogic  will create a  new  version  of  the
fragement .
For  insert,  if  URI already  exist, will  replace  the  content.

So, my  question  is  what  the  difference?  And, is  there  any
recommendation to  use  which one for  updating?

Thanks  very  much.

Regards,
Guoliang

On Nov 21, 2013 2:26 AM, "Michael Blakeley" <[email protected]> wrote:

Every document has a URI, and that URI acts as a primary key. A call to
doc($uri) is the fastest way to retrieve a document, so choose your primary
key with that in mind. Ask yourself what might make sense.

The primary key often comes from the XML itself, but not always. For
example you might receive documents from each of a number of sources, each
sending one document daily. In your application you might use that date +
source key as your primary document reference. In that case it might make
sense to create URIs like YYYY-MM-DD-SOURCE or SOURCE-YYYY-MM-DD.

Document URIs can also be organized in a directory structure, created
implicitly by any '/' characters. So if documents were structured by date
and source with one document per day as described about, the URI
/YYYY/MM/DD/SOURCE would allow efficient access by year, year-month, and
year-month-day via xdmp:directory, as well as date-source via fn:doc.

You might also consider putting SOURCE first in this structure:
/SOURCE/YYYY/MM/DD. Ask yourself which way would be more useful. What are
the common access patterns? The point is to design the URIs to support your
application and the way it works with the data.

While an ideal URI is based on application access patterns, there are cases
where documents simply do not have a natural primary key. For those
situations you might consider xdmp:random or xdmp:hash64. Along with
http://markmail.org/message/mm5vtacpdzwfy44j you might find this code
useful:

(: Return a 32-digit hex id from the inputs. :)
declare function local:generate-uuid2(
  $x as xs:unsignedLong,
  $y as xs:unsignedLong)
as xs:string
{
   let $x := xdmp:integer-to-hex($x)
   let $y := xdmp:integer-to-hex($y)
   return concat(
     substring('0000000000000000', 1, 16 - string-length($x)), $x,
     substring('0000000000000000', 1, 16 - string-length($y)), $y)
};

(: Generate a new id, without reserving it. :)
declare function local:new()
as xs:string
{
  let $id := local:generate-uuid2(xdmp:random(), xdmp:random())
  let $uri := '/sensor-data/'||$id
  return (
    (: Collisions should be very rare, but handle them anyway. :)
    if (exists(doc($uri))) then local:new()
    else ($id, xdmp:lock-for-update($uri)))
};

local:new()

-- Mike

On 20 Nov 2013, at 00:20 , Guoliang Li <[email protected]> wrote:

> Thanks Mike.
>
> Seems  there's  no  big  difference between  this two  query.  I  got
this  from  dev guide:
> "in the  case  of  modifying a  document, MarkLogic server creats new
 versions of  the  fragments ivolved in the  opperation."
>
> So, I'll  treat  document as  row  of table, and the  primary key  will
 be the  URI, right? However, I cannot  find  a perfect  way  to  generate
 the  URI, any  suggestions? Thanks.
>
> Regards,
> GL
> On Nov 20, 2013 12:20 PM, "Michael Blakeley" <[email protected]> wrote:
> Broadly speaking there isn't much difference. Functions for node-level
update are mostly just a convenience for developers. Internally, both end
up doing much the same thing. An update writes an entire XML tree, plus the
term-list entries for any relevant indexes.
>
> That might sound inefficient if you are thinking about documents as
tables. But in most cases documents act more like rows. Try to design that
way, with a primary key as the document URI.
>
> -- Mike
>
> On 19 Nov 2013, at 19:36 , Guoliang Li <[email protected]> wrote:
>
> > Hi all,
> >
> > I'm totally new  to  MarkLogic.
> > I know  i can update a  field by node-replace  or     document-insert.
> > May  i  know  the  difference in  term  of  poformance?  Thanks.
> >
> > Btw，  Our  app  will  keep  all  history data.
> >
> > Regards,
> > GL
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Update: replace vs insert

Reply via email to