I remember seeing a paper about indexing xml using Lucene here: https://ssl.bnt.com/idealliance/papers/xmle02/dx_xmle02/html/abstract/03-02-08.html
I think it will be applicable to your problem. - Derk On Mon, Nov 17, 2008 at 4:06 PM, Aleksander M. Stensby < [EMAIL PROTECTED]> wrote: > Yes, you have a lot of fields but if you want to do faceting and all > possible fields you will probably have to index each node in your document > as a separate field. You should look at solr, which supports facets out of > the box. Keep in mind what type of tokenizer(s) you use as this effects your > results when querying. > > - Aleksander > > > On Mon, 17 Nov 2008 13:49:51 +0100, Bapat, Mayur <[EMAIL PROTECTED]> wrote: > > My xml format similar to as follows - >> >> <?xml version="1.0" encoding="UTF-8" ?><Attributes><IBA8909 >> type="float">0.0</IBA8909><IBA8654 type="float">0.0</IBA8654><folder >> type="double">8254</folder><inheritedDomain >> type="string">true</inheritedDomain><iterationInfo.branchId >> type="double">8438</iterationInfo.branchId><Part >> type="double">12771</Part><member >> type="double">8443</member><IBA8608 >> type="string">Ver-dir(TOP)</IBA8608><IBA9119 >> type="float">0.0</IBA9119><genericType >> type="string">standard</genericType><name >> type="string">A001</name><versionInfo.identifier.versionId >> type="string">A</versionInfo.identifier.versionId><thePersistInfo.update >> Count >> type="double">3</thePersistInfo.updateCount><IBA8929 >> type="string">Carbon >> film</IBA8929><iterationInfo.creator >> type="double">10</iterationInfo.creator><iterationInfo.note >> type="string">just >> classified</iterationInfo.note><IBA8950 type="float">0</IBA8950><IBA8794 >> type="string">default</IBA8794><defaultTraceCode >> type="string">0</defaultTraceCode><IBA8866 >> type="double">0</IBA8866><IBA8897 >> type="float">0.0</IBA8897><name >> type="string">TRIMMER</name><versionInfo.identifier.versionSortId >> type="string">000001</versionInfo.identifier.versionSortId><number >> type="string">0000000001</number><iterationInfo.state >> type="string">ctrld</iterationInfo.state><IBA8925 >> type="float">0</IBA8925><IBA8754 >> type="float">0.0</IBA8754><IBA9097 type="float">0.0</IBA9097><IBA9123 >> type="float">0.0</IBA9123><thePersistInfo.createStamp >> type="datetime">2008-07-16T08:34:20</thePersistInfo.createStamp><state.s >> tate >> type="string">INWORK</state.state><masterReference >> type="double">8436</masterReference><IBA8839 >> type="float">0.0</IBA8839><thePersistInfo.updateStamp >> type="datetime">2008-08-01T10:44:35</thePersistInfo.updateStamp><IBA8776 >> type="string">True</IBA8776><view type="double">2931</view><IBA8653 >> type="float">0.0</IBA8653><IBA8802 >> type="float">0.0</IBA8802><organizationReference >> type="double">856</organizationReference><partType >> type="string">separable</partType><IBA8786 >> type="float">0.0</IBA8786><checkoutInfo.state >> type="string">c/i</checkoutInfo.state><thePersistInfo.markForDelete >> type="double">0</thePersistInfo.markForDelete><IBA8733 >> type="string">0</IBA8733><state.atGate >> type="string">false</state.atGate><iterationInfo.latest >> type="string">true</iterationInfo.latest><IBA8800 >> type="double">0</IBA8800><Part >> type="string">IBRTYPE|Part~~WCP|22037|1</Part><IBA9062 >> type="float">0.0</IBA9062><thePersistInfo.modifyStamp >> type="datetime">2008-08-01T10:44:35</thePersistInfo.modifyStamp><iterati >> onInfo.predecessor >> type="double">8440</iterationInfo.predecessor><IBA9035 >> type="string">Industry</IBA9035><IBA8806 type="float">0</IBA8806><source >> type="string">make</source><state.lifeCycleId >> type="double">6193</state.lifeCycleId><IBA8827 >> type="string">Through.Hole</IBA8827><iterationInfo.modifier >> type="double">10</iterationInfo.modifier><IBA8798 >> type="double">0</IBA8798><IBA8825 >> type="string">1</IBA8825><IBA8992 type="string">Open >> type</IBA8992><IBA8949 >> type="string">Yes</IBA8949><versionInfo.identifier.versionLevel >> type="double">1</versionInfo.identifier.versionLevel><defaultUnit >> type="string">ea</defaultUnit><containerReference >> type="double">8212</containerReference><endItem >> type="string">false</endItem><IBA8796 >> type="string">1</IBA8796><folderingInfo.cabinet >> type="double">8254</folderingInfo.cabinet><domainRef >> type="double">8216</domainRef><IBA9060 >> type="float">0.0</IBA9060><iterationInfo.identifier.iterationId >> type="string">2</iterationInfo.identifier.iterationId><IBA8080 >> type="string">sizeCode</IBA8080><IBA8891 >> type="float">0.0</IBA8891><IBA8804 >> type="float">0</IBA8804><IBA9005 >> type="float">0.0</IBA9005><checkoutInfo.derivedFrom >> type="double">8440</checkoutInfo.derivedFrom></Attributes> >> >> and I am interested in finding a document by giving a scope like - >> find a document with IBA8949=Yes AND IBA8802=0.0 and >> iterationInfo.predecessor=OR. >> >> It looks like for each element from this XML, I need to create a field >> for the index document and create a search query accordingly. >> >> >> >> >> -----Original Message----- >> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] >> Sent: Monday, November 17, 2008 1:55 PM >> To: java-user@lucene.apache.org >> Subject: Re: Scoped Search and Facets generation using Lucene >> >> I think that the closest you get to "scoped" search in your case would >> be to use filters. (If you index your paths, or if the documents have >> some standarized format, I assume you could just use one field per >> element in your document.) >> >> Maybe you could say a bit about you document structure? (If your >> question is a XML specific parsing question, the Lucene mailing list >> isn't really right place...) >> >> - Aleksander >> >> >> On Fri, 14 Nov 2008 17:52:59 +0100, Otis Gospodnetic >> <[EMAIL PROTECTED]> wrote: >> >> Hi Mayur, >>> >>> Solr has built-in support for facets. I don't understand what you >>> mean by scoped searches. Could you please give a concrete example? >>> >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> >>> ________________________________ >>> From: "Bapat, Mayur" <[EMAIL PROTECTED]> >>> To: java-user@lucene.apache.org >>> Sent: Friday, November 14, 2008 3:04:12 AM >>> Subject: Scoped Search and Facets generation using Lucene >>> >>> Hi, >>> >>> Does Lucene support Scoped Searches? My intention is to index an XML >>> String and search for a matching element/attribute value from that XML >>> >> >> by specifying scope(path). >>> Also is there any direct support for Facets building in Lucene? >>> >>> Regards, >>> Mayur >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> >> >> -- >> Aleksander M. Stensby >> Senior software developer >> Integrasco A/S >> www.integrasco.no >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > > -- > Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >