Re: XSLT transform before update?

2008-04-20 Thread David Smiley @MITRE.org

Thanks Shalin.

The particular XSLT processor used is not relevant; it's a spec.  Just use
the standard Java APIs.  If I want a particular processor, then I can get
that to happen by using a system property and/or you could offer a
configuration input for the standard factory class implementation for a
processor of my choice.

~ David


Shalin Shekhar Mangar wrote:
 
 Hi David,
 Actually you can concatenate values, however you'll have to write a bit of
 code. You can write this in javascript (if you're using Java 6) or in
 Java.
 
 Basically, you need to write a Transformer to do it. Look at
 http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
 
 For example, lets say you get fields first-name and last-name in the XML.
 But in the schema.xml you have a field called name in which you need to
 concatenate the values of first-name and last-name (with a space in
 between). Create a Java class:
 
 public class ConcatenateTransformer { public Object
 transformRow(MapString,
 Object row) { String firstName = row.get(first-name); String lastName =
 row.get(last-name); row.put(name, firstName +   + lastName); return
 row; } }
 
 Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
 
 The data-config.xml should like this:
 entity name=myEntity processor=XPathEntityProcessor url=
 http://myurl/example.xml;
 transformer=com.yourpackage.ConcatenateTransformer field
 column=first-name xpath=/record/first-name / field
 column=last-name
 xpath=/record/last-name / field column=name / /entity
 
 This will call ConcatenateTransformer.transformRow method for each row and
 you can concatenate any field with any field (or constant). Note that solr
 document will keep only those fields which are in the schema.xml, the rest
 are thrown away.
 
 If you don't want to write this in Java, you can use JavaScript by using
 the
 built-in ScriptTransformer, for an example look at
 http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
 
 However, I'm beginning to realize that XSLT is a common need, let me see
 how
 best we can accomodate it in DataImportHandler. Which XSLT processor will
 you prefer?
 
 On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
 [EMAIL PROTECTED]
 wrote:
 

 I'm in the same situation as you Daniel.  The DataImportHandler is pretty
 awesome but I'd also prefer it had the power of XSLT.  The XPath support
 in
 it doesn't suffice for me.  And I can't do very basic things like
 concatenate one value with another, say a constant even.  It's too bad
 there
 isn't a mode that XSLT can be put in to to not build the whole file into
 memory to do the transform.  I've been looking into this and have turned
 up
 nothing.  It would be neat if there was a STaX to multi-document adapter,
 at
 which point XSLT could be applied to the smaller fixed-size documents
 instead of the entire data stream.  I haven't found anything like this so
 it'd need to be built.  For now my documents aren't too big to XSLT
 in-memory.

 ~ David


 Daniel Papasian wrote:
 
  Shalin Shekhar Mangar wrote:
  Hi Daniel,
 
  Maybe if you can give us a sample of how your XML looks like, we can
  suggest
  how to use SOLR-469 (Data Import Handler) to index it. Most of the
  use-cases
  we have yet encountered are solvable using the XPathEntityProcessor in
  DataImportHandler without using XSLT, for details look at
 
 http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
 
  I think even if it is possible to use SOLR-469 for my needs, I'd still
  prefer the XSLT approach, because it's going to be a bit of
  configuration either way, and I'd rather it be an XSLT stylesheet than
  solrconfig.xml.  In addition, I haven't yet decided whether I want to
  apply any patches to the version that we will deploy, but if I do go
  down the route of the XSLT transform patch, if I end up having to back
  it out the amount of work that it would be for me to do the transform
 at
  the XML source would be negligible, where it would be quite a bit of
  work ahead of me to go from using the DataImportHandler to not using it
  at all.
 
  Because both the solr instance and the XML source are in house, I have
  the ability to apply the XSLT at the source instead of at solr.
  However, there are different teams of people that control the XML
 source
  and solr, so it would require a bit more office coordination to do it
 on
  the backend.
 
  The data is a filemaker XML export (DTD fmresultset) and it looks
  roughly like this:
  fmresultset
 resultset
   field name=IDdata125/data/field
   field name=organizationdataFord Foundation/data/field
   ...
   relatedset table=Employees
 record
   field name=IDdataY5-A/data/field
   field name=NamedataJohn Smith/data/field
 /record
 record
   field name=IDdataY5-B/data/field
   field name=NamedataJane Doe/data/field
   

Re: DataField parsing error using BinaryResponseParser for solrj

2008-04-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is not a problem with the BinaryResponseWriter itself. It is caused
by the bug https://issues.apache.org/jira/browse/SOLR-470
we need to fix it now.
--Noble

On Mon, Apr 21, 2008 at 9:16 AM, Eason. Lee [EMAIL PROTECTED] wrote:
 Error comes from solr while parsing the datefield
  It is ok with XMLResponseParser

  Apr 22, 2008 11:02:13 AM org.apache.solr.common.SolrException log
  SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable
  date:
  1995-02-16T00:00:00Z
 at org.apache.solr.schema.DateField.toObject(DateField.java:173)
 at org.apache.solr.schema.DateField.toObject(DateField.java:83)
 at
  org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryRe
  sponseWriter.java:137)
 at
  org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(Bi
  naryResponseWriter.java:115)
 at
  org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryR
  esponseWriter.java:84)
 at
  org.apache.solr.common.util.NamedListCodec.writeVal(NamedListCodec.ja
  va:128)
 at
  org.apache.solr.common.util.NamedListCodec.writeNamedList(NamedListCo
  dec.java:118)
 at
  org.apache.solr.common.util.NamedListCodec.marshal(NamedListCodec.jav
  a:77)
 at
  org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWrit
  er.java:44)
 at
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
  r.java:295)
 at
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
  icationFilterChain.java:235)
 at
  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
  ilterChain.java:206)
 at
  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
  alve.java:233)
 at
  org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
  alve.java:175)
 at
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
  ava:128)
 at
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
  ava:102)
 at
  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
  ve.java:109)
 at
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
  a:286)
 at
  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
  :844)
 at
  org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
  ss(Http11Protocol.java:583)
 at
  org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
  7)
 at java.lang.Thread.run(Thread.java:619)
  Caused by: java.text.ParseException: Unparseable date:
  1995-02-16T00:00:00Z
 at java.text.DateFormat.parse(DateFormat.java:337)
 at org.apache.solr.schema.DateField.toObject(DateField.java:170)
 ... 21 more




-- 
--Noble Paul


Re: DataField parsing error using BinaryResponseParser for solrj

2008-04-20 Thread Eason . Lee
Thanks

2008/4/21, Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]:

 It is not a problem with the BinaryResponseWriter itself. It is caused
 by the bug https://issues.apache.org/jira/browse/SOLR-470
 we need to fix it now.
 --Noble

 On Mon, Apr 21, 2008 at 9:16 AM, Eason. Lee [EMAIL PROTECTED] wrote:
  Error comes from solr while parsing the datefield
   It is ok with XMLResponseParser
 
   Apr 22, 2008 11:02:13 AM org.apache.solr.common.SolrException log
   SEVERE: java.lang.RuntimeException: java.text.ParseException:
 Unparseable
   date:
   1995-02-16T00:00:00Z
  at org.apache.solr.schema.DateField.toObject(DateField.java:173)
  at org.apache.solr.schema.DateField.toObject(DateField.java:83)
  at
   org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryRe
   sponseWriter.java:137)
  at
   org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(Bi
   naryResponseWriter.java:115)
  at
   org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryR
   esponseWriter.java:84)
  at
   org.apache.solr.common.util.NamedListCodec.writeVal(NamedListCodec.ja
   va:128)
  at
   org.apache.solr.common.util.NamedListCodec.writeNamedList(NamedListCo
   dec.java:118)
  at
   org.apache.solr.common.util.NamedListCodec.marshal(NamedListCodec.jav
   a:77)
  at
   org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWrit
   er.java:44)
  at
   org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
   r.java:295)
  at
   org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
   icationFilterChain.java:235)
  at
   org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
   ilterChain.java:206)
  at
   org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
   alve.java:233)
  at
   org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
   alve.java:175)
  at
   org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
   ava:128)
  at
   org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
   ava:102)
  at
   org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
   ve.java:109)
  at
   org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
   a:286)
  at
   org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
   :844)
  at
   org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
   ss(Http11Protocol.java:583)
  at
   org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
   7)
  at java.lang.Thread.run(Thread.java:619)
   Caused by: java.text.ParseException: Unparseable date:
   1995-02-16T00:00:00Z
  at java.text.DateFormat.parse(DateFormat.java:337)
  at org.apache.solr.schema.DateField.toObject(DateField.java:170)
  ... 21 more
 



 --
 --Noble Paul