Alan,
I would expect your use-case to work, as long as RecordLoader is in
ID_NAME=#FILENAME mode (which is the default). I tried to create a test
case following your report, but I can't reproduce the problem.
Maybe you can tell me what differs between our inputs or configurations?
Are you setting any extra RecordLoader properties? Are you using the
latest recordloader.jar?
$ cat test-ad.xml
<AuthEnty affiliation=""Carleton University. Data Centre" "/>
$ recordloader.sh test-ad.xml
RecordLoader starting, version 2008-08-08.1 on 1.6.0_07
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader initConfiguration
INFO: Configuration is com.marklogic.recordloader.xcc.XccConfiguration
logging to CONSOLE
logging to file simplelogger-%u-%g.log
20-Aug-2008 2:06:36 PM com.marklogic.ps.SimpleLogger configureLogger
INFO: setting up logging for: com.marklogic.ps
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
setIdNodeName
WARNING: no ID_NAME specified: using #FILENAME
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
setUseFilenameIds
INFO: generating ids from file names
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
configureOptions
INFO: using output encoding UTF-8
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
configureOptions
INFO: QUEUE_CAPACITY = 1000
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
configureCollections
INFO: adding extra collection: com.marklogic.ps.RecordLoader.1219266396593
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration configure
INFO: connecting to xcc://admin:[EMAIL PROTECTED]:9000/
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader <init>
INFO: RecordLoader starting, version 2008-08-08.1 on 1.6.0_07
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader checkEnvironment
INFO: XPP3 version = 1.1.4c
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader getDecoder
INFO: using input encoding UTF-8
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader getDecoder
INFO: using malformed input action REPORT
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: zipFiles.size = 0
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: xmlFiles.size = 1
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: thread count = 1
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.LoaderFactory <init>
INFO: Loader is com.marklogic.recordloader.FileLoader
20-Aug-2008 2:06:36 PM com.marklogic.ps.RecordLoader run
INFO: populating queue
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
getContentFactoryConstructor
INFO: ContentFactory is com.marklogic.recordloader.xcc.XccContentFactory
20-Aug-2008 2:06:36 PM com.marklogic.recordloader.Configuration
getContentFactoryConstructor
INFO: client = 3.2-7, server = 3.2-7
20-Aug-2008 2:06:37 PM com.marklogic.recordloader.Monitor run
INFO: loaded 1 records ok (72 B in 0.50017991 s, 2 tps, 0 kB/s), with 0
error(s)
If I go to cq, I can see the contents:
doc('test-ad.xml')
=>
<AuthEnty affiliation=""Carleton University. Data Centre""/>
-- Mike
(in case it's useful...)
$ cat /home/mblakele/bin/recordloader.sh
#!/bin/sh
#
BASE=`dirname $0`
CP=$HOME/lib/java/recordloader.jar
CP=$CP:$HOME/lib/java/xcc.jar
CP=$CP:$HOME/lib/java/xpp3.jar
FILES=
VMARGS=
for a in $*; do
if [ -e $a ]; then
FILES="$FILES $a"
else
VMARGS="$VMARGS $a"
fi
done
if [ -d "$JAVA_HOME" ]; then
JAVA=$JAVA_HOME/bin/java
else
JAVA=java
fi
$JAVA -cp $CP $VMARGS com.marklogic.ps.RecordLoader $FILES
# end recordloader.sh
Alan Darnell wrote:
I'm trying to load some documents that come to me with the following
error:
<AuthEnty affiliation=""Carleton University. Data Centre" ">
I know that entities are not allowed in attributes but I can't change
the program that generates these files.
I want to load them into MarkLogic and have ML take care of repairing
them. When I use:
xdmp:document-load ("/odesi/slid_75M0010_E_2003ke.xml",
<options xmlns="xdmp:document-load">
<repair>full</repair>
<permissions>{xdmp:default-permissions()}</permissions>
</options>)
the documents load and the entities are removed. Great.
But I'd like to use recordloader for this task -- particularly in
conjunction with the AutoLoader program.
When running record loader I'm using the XML_REPAIR_LEVEL=FULL
property as in
java -cp recordloader.jar:xcc.jar:xpp3-1.1.4c.jar -
DXML_REPAIR_LEVEL=FULL
but when recordloader tries to load the documents with these errors,
instead of correcting them it says:
SEVERE: exception
com.marklogic.xcc.exceptions.XQueryException: XDMP-STARTTAGCHAR:
Unexpected character "U" in start tag at cipo_912_1_E_1989-12/481 line
12
in /insert
Line 12 of that document has this in it:
<AuthEnty affiliation=""Carleton University. Data Centre" ">
Any ideas or help would be appreciated. Thanks in advance.
Alan
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general