As Mike indicated, there is currently no way to extend mlcp (Content Pump) with the ability to validate documents as they arrive, or do any other arbitrary processing as part of a loading task. We’re looking at how we might support this in a future release, so additional input would be appreciated.
Information Studio supports schema validation today. In an Information Studio flow you can configure a transformation step to validate against a Schemas database as documents arrive. See the docs at <http://docs.marklogic.com/guide/infostudio/loadingContent>. With Information Studio you won’t get the ability to distribute the actual loading across many nodes, like you would with mlcp, but Information Studio provides a convenient user interface and the ability to build custom collectors and transformation steps. If you’re set on mlcp, I think pre-processing in some other tool or implementing a CPF (or a lower-level trigger-based) approach is probably your best bet. Justin Justin Makeig Director, Product Management MarkLogic Corporation [email protected] Phone: +1 650 655 2387 www.marklogic.com On Nov 12, 2012, at 10:24 AM, Michael Blakeley <[email protected]> wrote: > I don't see anything relevant at > http://docs.marklogic.com/guide/ingestion/content-pump - but mlcp is designed > to work with hadoop. Possibly you could validate the XML in hadoop tasks? > Also mlcp is open-source, so you could always patch it to do what you want. > > RecordLoader would do this using a CONTENT_MODULE_URI written in XQuery, and > invoked via XCC or HTTP requests. See > http://marklogic.github.com/recordloader/ for details. > > Since we know from your other email that you are thinking of using CPF, you > might also consider using the CPF validation pipeline. > > -- Mike > > On 12 Nov 2012, at 01:39 , sini narayanan <[email protected]> wrote: > >> Hi, >> >> I have a requirement where I need to use content pump to load the files into >> the MarkLogic DB. While loading contents, I need to make sure that the input >> xml file conforms to the schema. Is it possible to perform a strict schema >> validation on the xml files, while inserting them through content pump? >> >> Please help… >> >> >> >> Thanks, >> >> Sini >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
