AW: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread Hausherr, Jens
Hi, if you just need to preserve the xml for storing you could simply wrap the xml markup in CDATA. Splitting your structure beforehand and using dynamic fields might be a viable solution... eg. add doc field name=foo1value 1/field field name=foo2value 2/field field

RE: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread Binkley, Peter
I've used eXist for this kind of thing and had good experiences, once I got a grip on Xquery (which is definitely worth learning). But I've only used it for small collections (under 10k documents); I gather its effective ceiling is much lower than Solr's. Possibly it will be possible to use

Re: AW: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread David Neubert
Thanks -- C-Data might be useful -- and I was looking into dynamic fields as solution as well -- I think a combination of the two might work. - Original Message From: Hausherr, Jens [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, November 8, 2007 4:03:02 AM Subject:

Re: AW: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread Chris Hostetter
: Thanks -- C-Data might be useful -- and I was looking into dynamic : fields as solution as well -- I think a combination of the two might : work. I must admit i haven't been following this thread that closely, so i'm not sure how much of the structure of the XML you want to preserve for the

Re: AW: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread Tricia Williams
Hi Dave, This sounds like what I've been trying to work out with https://issues.apache.org/jira/browse/SOLR-380. The idea that I'm running with right now is indexing the xml and storing the data in the xml tags as a Payload. Payload is a relatively new idea from Lucene. A custom

Re: AW: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread David Neubert
Chris I'll try to track down your Jira issue. (2) sounds very helpful -- I am only 2 days old in SOLR/Lucene experience, but know what I need -- and basically its to search by the main granules in an xml document, with usually turn out to be for books book (rarley), chapter (more often),

Re: What is the best way to index xml data preserving the mark up?

2007-11-08 Thread David Neubert
Thanks, I think storing the XPath is where I will ultimately wind up -- I will look into your links recommended below. Its an interesting debate where the break even point is between Lucene/XPath storing XPath info -- utilizing that for lookup and position within DOM structures, verse a full

Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread Walter Underwood
If you really, really need to preserve the XML structure, you'll be doing a LOT of work to make Solr do that. It might be cheaper to start with software that already does that. I recommend MarkLogic -- I know the principals there, and it is some seriously fine software. Not free or open, but very,

What is the best way to index xml data preserving the mark up?

2007-11-07 Thread David Neubert
I am sure this is 101 question, but I am bit confused about indexing xml data using SOLR. I have rich xml content (books) that need to searched at granular levels (specifically paragraph and sentence levels very accurately, no approximations). My source text has exact p/p and s/s tags for

Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread Norberto Meijome
On Wed, 7 Nov 2007 20:18:25 -0800 (PST) David Neubert [EMAIL PROTECTED] wrote: I am sure this is 101 question, but I am bit confused about indexing xml data using SOLR. I have rich xml content (books) that need to searched at granular levels (specifically paragraph and sentence levels

Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread David Neubert
Thanks Walter -- I am aware of MarkLogic -- and agree -- but I have a very low budget on licensed software in this case (near 0) -- have you used eXists or Xindices? Dave - Original Message From: Walter Underwood [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday,