Another benefit to staging and then transforming in two separate steps is that 
your transformation might require data that hasn't yet been loaded, for example 
to denormalize values from related documents that may or may not have already 
arrived.

Justin


--
Justin Makeig
Director, Product Management
MarkLogic
jmak...@marklogic.com<mailto:jmak...@marklogic.com>

On Aug 31, 2016, at 7:06 AM, Dave Cassel 
<dave.cas...@marklogic.com<mailto:dave.cas...@marklogic.com>> wrote:

Hi Tim,

If you know at ingest time how you want to transform data, then from a 
performance point of view I think it's better to do it with the MLCP transform. 
Doing so means writing each fragment just once. There are some trade-offs:

Advantages of an mlcp transform:

  *   the data only need to be written once, instead of written and then 
updated. Doing the latter will result in deleted fragments, requiring merges to 
clean up.
  *   Once the data are in the database, they are fully ready for use — no need 
to segment freshly loaded data from data that is ready for use

Advantages of load-as-is followed by a CORB job:

  *   works if you don't know how you want to format the data — load, play, 
revise, repeat
  *   protection against errors in the transform: if your MLCP has a fatal 
error that affects only some documents, the whole batch will fail to get 
inserted. Good error handling can prevent this, but you may still need to 
account for not-fully-transformed documents.

Dave.

--
Dave Cassel<http://davidcassel.net/>, @dmcassel<https://twitter.com/dmcassel>
Technical Community Manager
MarkLogic Corporation<http://www.marklogic.com/>
http://developer.marklogic.com/


From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Timothy Taylor <timmy...@gmail.com<mailto:timmy...@gmail.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, August 29, 2016 at 6:34 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] #CGO#How to ingest data of selected 
columns from CSV using MLCP & how to define and use primary key to see the log 
data (failed data in terms of Bad file)

Hey Dave,

Tim Taylor from the alliances team here. Subscribed from my personal email.

Any thoughts on whether an mlcp  transform on the inbound side versus load as 
is and run a CORB job to clean up afterwards would perform better?

Tim

Sent from my iPhone

On Aug 29, 2016, at 2:10 PM, Dave Cassel 
<dave.cas...@marklogic.com<mailto:dave.cas...@marklogic.com>> wrote:

You can write an MLCP 
transform<http://docs.marklogic.com/guide/mlcp/import#id_82518>. That should 
get the individual XML documents as input and your output can structure them 
however you want. This post on recursive 
descent<http://developer.marklogic.com/blog/xquery-recursive-descent> should 
help, too — you'll use that in your transform.

--
Dave Cassel<http://davidcassel.net/>, @dmcassel<https://twitter.com/dmcassel>
Technical Community Manager
MarkLogic Corporation<http://www.marklogic.com/>
http://developer.marklogic.com/


From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of "Khan, Zishan" 
<zishan.k...@capgemini.com<mailto:zishan.k...@capgemini.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, August 29, 2016 at 8:12 AM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] #CGO#How to ingest data of selected columns 
from CSV using MLCP & how to define and use primary key to see the log data 
(failed data in terms of Bad file)

Hi Folk,

Being a newbie in MarkLogic, I need your serious response to come out of faced 
challenges.
The moment when I was ingesting CSV Structured data into MarkLogic as-is to get 
XML output data in our database using MLCP, I am ok with normal ingestion using 
MLCP in any file format but I got stuck in finding the solution of below 
mentioned problem

1.      How to ingest data corresponding to selected columns only in MarkLogic 
using MLCP or by any means.
2.      How to define and use Primary key to check the logs ( say failed data ).

For the ease of understanding I am exploring my doubts with below example.

Input :

id            emp_name          salary    designation         mobile_no         
  dependent

1             ABC                       3000      X                            
4444444444        2
2             DEF                       4000      Y                            
2222222222        1                          (input is in CSV format)
3             GHI                       3000      X                            
3333333333        0
4             ABC                       8000      Z                            
9999999999        2

Q.1)

Output :

id            emp_name          designation         dependent

1             ABC                       X                            2
2             DEF                       Y                            1          
                                                (output is in default XML 
format, here this is just to understand)
3             GHI                       X                            0
4             ABC                       Z                            2

Q.2)

Suppose input data corresponding to id = 4 is failed.

Failed data :

4             ABC                       Z                            2          
   (say)

How to see this data using id as primary key (Even I don’t know how to mention 
primary key in MarkLogic) using MLCP or any other means.

Your lead would be much more appreciated if I will get the solution of 
respected solution with required supporting stuffs.


Thanks & Regards,
Zishan Khan / Capgemini
Associate Consultant | FSGBU | Insight and Data




This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to