Just for the context, here is a snippet from index.properties

#fgsindex.untokenizedFields = list of index fields created as UN_TOKENIZED
###########################
# Effect: during search the KeywordAnalyzer is used for untokenized fields,
# while the fgsindex.analyzer is used for other fields.
# Only untokenized fields, which do not occur in every index document,
# need be listed here.
# example:

#fgsindex.untokenizedFields = fgs.contentModel uf1 uf2

Stating fgsindex.untokenizedFields = dsm
can only have an effect, if an untokenized IndexField named "dsm" is defined in 
the indexing stylesheet, and it cannot have any influence on or connection to a 
tokenized field called "dsm.PDF_DOC", so there must be something else behind 
your observation.


On 20/10/2010, at 09.35, Matteo Boschini wrote:

Yes, that I know and understand.
I have in fact a datastream eg. PDF_DOC and its mime type is pdf, and it used 
to extract text.
What surprises me is that in basicFoxmlToLucene.xslt that IndexField is defined 
as TOKENIZED, but it gets indexed if-only-if I set

fgsindex.untokenizedFields              = dsm

in index.properties.
Why ?
Thanks for any answer

On Wed, Oct 20, 2010 at 9:10 AM, Gert Schmeltz Pedersen 
<g...@dtic.dtu.dk<mailto:g...@dtic.dtu.dk>> wrote:
I think the main point to understand is that the xsl:for-each selects one 
managed foxml:datastream element at a time, it has an ID attribute with a 
value, e.g. “PDF_DOC”. Then the IFname attribute of the IndexField element gets 
its value from the concat() function, e.g. “dsm.PDF_DOC”. So the prerequisite 
is that your foxml has such a datastream, and it is the mimetype of this that 
is used by exts:getDatastreamText() to extract the text from the document.

Best regards,
Gert

From: Matteo Boschini 
[mailto:matteo.bosch...@gmail.com<mailto:matteo.bosch...@gmail.com>]
Sent: 19. oktober 2010 16:22
To: Fedora Users
Subject: Re: [fcrepo-user] dummy question on gsearch

Ok, solved but I do not understand why:
dsm is defined as TOKENIZED in basicFoxmlToLucene, but if I modify 
index.properties with this line:

fgsindex.untokenizedFields              = dsm

"magicaly" dsm get indexed.
Why ? What am I missing ?
On Tue, Oct 19, 2010 at 3:35 PM, Matteo Boschini 
<matteo.bosch...@gmail.com<mailto:matteo.bosch...@gmail.com>> wrote:
Sorry for this very dummy/stupid question...
I've succeeded in setting up gsearch with full-text datastream Lucene indexing 
at least 4 times, but now I an no longer do it...

I have a BasicIndex config/setup, and am trying to get some PDF datastreams 
full-text indexed.

basicFoxmlToLucene has lines saying:

<xsl:for-each select="foxml:datastre...@control_group='M']">
                                <IndexField index="TOKENIZED" store="YES" 
termVector="NO">
                                        <xsl:attribute name="IFname">
                                                <xsl:value-of 
select="concat('dsm.', @ID)"/>
                                        </xsl:attribute>
                                        <xsl:value-of 
select="exts:getDatastreamText($PID, $REPOSITORYNAME, @ID, $FEDORASOAP, 
$FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/>
                                </IndexField>
                        </xsl:for-each>

thus I assume that Managed datastreams, of a MIME-type defined in 
fedoragsearch.properties (actualy, application/pdf, it's there by default), 
should be indexed, but they're not (I checked also with luke).
And in fact, in browseIndex, the FieldName  dms.ID is not listed.

I'm surely missing something stupid, may be someone out there can help me...




------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net<mailto:Fedora-commons-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


<ATT00001..c><ATT00002..c>

------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to