Alan,

I would expect your use-case to work, as long as RecordLoader is in ID_NAME=#FILENAME mode (which is the default). I tried to create a test case following your report, but I can't reproduce the problem.

Maybe you can tell me what differs between our inputs or configurations? Are you setting any extra RecordLoader properties? Are you using the latest recordloader.jar?

$ cat test-ad.xml
<AuthEnty affiliation="&quot;Carleton University. Data Centre&quot; "/>
$ recordloader.sh test-ad.xml
RecordLoader starting, version 2008-08-08.1 on 1.6.0_07
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader initConfiguration
INFO: Configuration is com.marklogic.recordloader.xcc.XccConfiguration
logging to CONSOLE
logging to file simplelogger-%u-%g.log
20-Aug-2008 2:06:36 PM com.marklogic.ps.SimpleLogger configureLogger
INFO: setting up logging for: com.marklogic.ps
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration setIdNodeName
WARNING: no ID_NAME specified: using #FILENAME
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration setUseFilenameIds
INFO: generating ids from file names
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration configureOptions
INFO: using output encoding UTF-8
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration configureOptions
INFO: QUEUE_CAPACITY = 1000
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration configureCollections
INFO: adding extra collection: com.marklogic.ps.RecordLoader.1219266396593
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration configure
INFO: connecting to xcc://admin:[EMAIL PROTECTED]:9000/
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader <init>
INFO: RecordLoader starting, version 2008-08-08.1 on 1.6.0_07
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader checkEnvironment
INFO: XPP3 version = 1.1.4c
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader getDecoder
INFO: using input encoding UTF-8
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader getDecoder
INFO: using malformed input action REPORT
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: zipFiles.size = 0
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: xmlFiles.size = 1
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: thread count = 1
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.LoaderFactory <init>
INFO: Loader is com.marklogic.recordloader.FileLoader
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: populating queue
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration getContentFactoryConstructor
INFO: ContentFactory is com.marklogic.recordloader.xcc.XccContentFactory
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration getContentFactoryConstructor
INFO: client = 3.2-7, server = 3.2-7
20-Aug-2008 2:06:37 PM com.marklogic.recordloader.Monitor run
INFO: loaded 1 records ok (72 B in 0.50017991 s, 2 tps, 0 kB/s), with 0 error(s)

If I go to cq, I can see the contents:

doc('test-ad.xml')
=>
<AuthEnty affiliation="&quot;Carleton University. Data Centre&quot;"/>

-- Mike

(in case it's useful...)

$ cat /home/mblakele/bin/recordloader.sh
#!/bin/sh
#

BASE=`dirname $0`

CP=$HOME/lib/java/recordloader.jar
CP=$CP:$HOME/lib/java/xcc.jar
CP=$CP:$HOME/lib/java/xpp3.jar

FILES=
VMARGS=

for a in $*; do
    if [ -e $a ]; then
        FILES="$FILES $a"
    else
        VMARGS="$VMARGS $a"
    fi
done

if [ -d "$JAVA_HOME" ]; then
  JAVA=$JAVA_HOME/bin/java
else
  JAVA=java
fi

$JAVA -cp $CP $VMARGS com.marklogic.ps.RecordLoader $FILES

# end recordloader.sh


Alan Darnell wrote:
I'm trying to load some documents that come to me with the following error:

<AuthEnty affiliation="&quot;Carleton University. Data Centre&quot; ">

I know that entities are not allowed in attributes but I can't change the program that generates these files.

I want to load them into MarkLogic and have ML take care of repairing them. When I use:

xdmp:document-load ("/odesi/slid_75M0010_E_2003ke.xml",
<options xmlns="xdmp:document-load">
       <repair>full</repair>
       <permissions>{xdmp:default-permissions()}</permissions>
</options>)

the documents load and the entities are removed.  Great.

But I'd like to use recordloader for this task -- particularly in conjunction with the AutoLoader program.

When running record loader I'm using the XML_REPAIR_LEVEL=FULL property as in

java -cp recordloader.jar:xcc.jar:xpp3-1.1.4c.jar - DXML_REPAIR_LEVEL=FULL

but when recordloader tries to load the documents with these errors, instead of correcting them it says:

SEVERE: exception
com.marklogic.xcc.exceptions.XQueryException: XDMP-STARTTAGCHAR: Unexpected character "U" in start tag at cipo_912_1_E_1989-12/481 line 12
in /insert

Line 12 of that document has this in it:

<AuthEnty affiliation="&quot;Carleton University. Data Centre&quot; ">


Any ideas or help would be appreciated.  Thanks in advance.

Alan


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to