To my knowledge putting hexbin inside binary { } is the way to create a
real binary. So your approach should already work. Did you check?



A small optimization could be to make batches of let’s say about a 100
records, build a map:map of them, pass that to a spawn process that inserts
all 100. You are creating a new task server thread for every record now.
The task server queue has a limit, and doing it in batches of 100 files
usually works faster.



Here a bit of sample ‘transaction’ code I copied from collector-feed.xqy (
https://github.com/marklogic/infostudio-plugins/blob/master/collectors/collector-feed.xqy
):



    let $entries := …

    let $entry-count  := count($entries)



    let $transaction-size := 100

    let $total-transactions := ceiling($entry-count div $transaction-size)



    (: create transactions by breaking document set into maps

       each maps's documents are saved to the db in their own transaction :)

    let $transactions :=

        for $i at $index in 1 to $total-transactions

        let $map := map:map()

        let $start :=  (($i -1) *$transaction-size) + 1

        let $finish := min((($start  - 1 + $transaction-size),
$entry-count))

        let $put :=

            for $entry in ($entries)[$start to $finish]

            let $id := fn:concat(fn:string($entry/atom:id),".xml")

            return map:put($map,$id,$entry)

        return $map



    (: the callback function for ingest :)

    let $ingestion :=

        for $transaction at $index in $transactions

        return

               infodev:transaction($transaction,$ticket-id,
xdmp:function(xs:QName("feed:process-file")),$policy-deltas,$index,(),())



Replace $entries with $table/row, change let $id to use your uris, and
replace that infodev:transaction call with your own spawn that should take
an entire map, loop over its keys, and do an insert for each key/value
within the map..



Cheers,

Geert



*Van:* [email protected] [mailto:
[email protected]] *Namens *Todd Gochenour
*Verzonden:* zaterdag 25 februari 2012 7:10
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?



It's time for me to pick this project up now that the work week has
passed.



I'm attempting to implement Michael Blakeley's recommendation to move the
SQL blob content into it's own document as part of this initial load/chunk
phase.  Here's how I see the strategy.  As I iterate through each record in
a table, when there is an element with the attribute
xsi:type="xs:hexBinary", I want to extract this data, generate a new
document. replace the original element with a reference to this new
document, and then spawn two 'document-insert.xqy' operations, one for the
original document and one for the binary document.



These are my current issues.   I haven't figured out how to convert the
hexBinary into binary so that when I fetch the document I get the correct
format.  I probably need to be setting mime type.  The @xsi:type attribute
isn't part of the table_data, so I can't trigger blob processing based upon
this attribute. I'm currently only processing elements called file_blob.



My current working copy now looks like:



(: query console :)
xquery version "1.0-ml";
for $table in
xdmp:document-get('C:\Users\servicelogix\slx\us_co_slx.xml')/*/*/table_data
  let $table-name := $table/@name/string()
  let $database-name := $table/../@name/string()
  for $row in $table/row
    let $record-uri :=
concat('/',$database-name,'/',$table-name,'/id-',$row/field[@name='id'])
    let $file-uri :=
concat('/',$database-name,'/',$table-name,'/file-',$row/field[@name='id'])
    let $blob := if($row/field[@name='file_blob'][1]) then binary
{xs:hexBinary($row/field[@name='file_blob'][1])} else ()
    let $record := element { $table-name } {
      $row/field[text() and not(@name='file_blob')]/element
      { if(number(substring(@name,1,1))=number(substring(@name,1,1))) then
concat('_',@name) else @name } { text() },
      if($blob) then element file_uri { $file-uri } else ()
    }
    return (
       if($record) then xdmp:spawn('document-insert.xqy', (xs:QName('URI'),
$record-uri, xs:QName('NEW'), $record)) else (),
       if($blob) then xdmp:spawn('document-insert.xqy', (xs:QName('URI'),
$file-uri, xs:QName('NEW'), $blob)) else ()
    )
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to