Yes, that I know and understand.
I have in fact a datastream eg. PDF_DOC and its mime type is pdf, and it
used to extract text.
What surprises me is that in basicFoxmlToLucene.xslt that IndexField is
defined as TOKENIZED, but it gets indexed if-only-if I set
fgsindex.untokenizedFields = dsm
in index.properties.
Why ?
Thanks for any answer
On Wed, Oct 20, 2010 at 9:10 AM, Gert Schmeltz Pedersen <g...@dtic.dtu.dk>wrote:
> I think the main point to understand is that the xsl:for-each selects one
> managed foxml:datastream element at a time, it has an ID attribute with a
> value, e.g. “PDF_DOC”. Then the IFname attribute of the IndexField element
> gets its value from the concat() function, e.g. “dsm.PDF_DOC”. So the
> prerequisite is that your foxml has such a datastream, and it is the
> mimetype of this that is used by exts:getDatastreamText() to extract the
> text from the document.
>
>
>
> Best regards,
>
> Gert
>
>
>
> *From:* Matteo Boschini [mailto:matteo.bosch...@gmail.com]
> *Sent:* 19. oktober 2010 16:22
> *To:* Fedora Users
> *Subject:* Re: [fcrepo-user] dummy question on gsearch
>
>
>
> Ok, solved but I do not understand why:
> dsm is defined as TOKENIZED in basicFoxmlToLucene, but if I modify
> index.properties with this line:
>
> fgsindex.untokenizedFields = dsm
>
> "magicaly" dsm get indexed.
> Why ? What am I missing ?
>
> On Tue, Oct 19, 2010 at 3:35 PM, Matteo Boschini <
> matteo.bosch...@gmail.com> wrote:
>
> Sorry for this very dummy/stupid question...
> I've succeeded in setting up gsearch with full-text datastream Lucene
> indexing at least 4 times, but now I an no longer do it...
>
> I have a BasicIndex config/setup, and am trying to get some PDF datastreams
> full-text indexed.
>
> basicFoxmlToLucene has lines saying:
>
> <xsl:for-each select="foxml:datastre...@control_group='M']">
> <IndexField index="TOKENIZED" store="YES"
> termVector="NO">
> <xsl:attribute name="IFname">
> <xsl:value-of
> select="concat('dsm.', @ID)"/>
> </xsl:attribute>
> <xsl:value-of
> select="exts:getDatastreamText($PID, $REPOSITORYNAME, @ID, $FEDORASOAP,
> $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/>
> </IndexField>
> </xsl:for-each>
>
> thus I assume that Managed datastreams, of a MIME-type defined in
> fedoragsearch.properties (actualy, application/pdf, it's there by default),
> should be indexed, but they're not (I checked also with luke).
> And in fact, in browseIndex, the FieldName dms.ID is not listed.
>
> I'm surely missing something stupid, may be someone out there can help
> me...
>
>
>
>
>
> ------------------------------------------------------------------------------
> Download new Adobe(R) Flash(R) Builder(TM) 4
> The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
> Flex(R) Builder(TM)) enable the development of rich applications that run
> across multiple browsers and platforms. Download your free trials today!
> http://p.sf.net/sfu/adobe-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users