Bugs item #2929885, was opened at 2010-01-11 17:01
Message generated for change (Comment added) made by jflokstra
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2929885&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PFtijah
Group: Pathfinder CVS Head
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roberto Cornacchia (cornuz)
Assigned to: Jan Flokstra (jflokstra)
Summary: PFTIJAH: whitelist in "_param" bat incomplete
Initial Comment:
I create the pftijah index with a whitelist, as in:
<TijahOptions stemmer="porter-english" whitelist="invention-title abstract
description claim"/>
The whitelist works well. The tag dictionary contains only the tags I wanted +
all the tags included.
mil>bat("tj_DFLT_FT_INDEX_tagdict").print();
#---------------------------------#
# h t # name
# void str # type
#---------------------------------#
[ 0...@0, "_DOCUMENT_ROOT" ]
[ 1...@0, "invention-title" ]
[ 2...@0, "abstract" ]
[ 3...@0, "p" ]
[ 4...@0, "description" ]
[ 5...@0, "ul" ]
[ 6...@0, "li" ]
[ 7...@0, "sup" ]
[ 8...@0, "sl" ]
[ 9...@0, "heading" ]
[ 1...@0, "u" ]
[ 1...@0, "tables" ]
[ 1...@0, "table" ]
[ 1...@0, "tgroup" ]
[ 1...@0, "tbody" ]
[ 1...@0, "row" ]
[ 1...@0, "entry" ]
[ 1...@0, "sub" ]
[ 1...@0, "img" ]
[ 1...@0, "b" ]
[ 2...@0, "claim" ]
[ 2...@0, "claim-text" ]
However, the "tj_DFLT_FT_INDEX_param" bat contains only the FIRST tag specified
in my whitelist:
mil>bat("tj_DFLT_FT_INDEX_param").print();;
#-------------------------------------------------#
# h t # name
# str str # type
#-------------------------------------------------#
[ "curFragment", "0" ]
[ "preExpansion", "4" ]
[ "_version", "1.1" ]
[ "stemmer", "porter-english" ]
[ "tokenizer", "flex" ]
[ "name", "DFLT_FT_INDEX" ]
[ "fragmentSize", "1073741823" ]
[ "delay_finalize", "0" ]
[ "whitelist", "invention-title" ]
[ "lastStopWord", "430" ]
[ "_last_tijahPre", "7662" ]
[ "status", "finalized" ]
[ "collectionSize", "7055" ]
[ "_last_finalizedPre", "7661" ]
----------------------------------------------------------------------
>Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 16:26
Message:
I fixed it by making a copy of the value before tokenizing it with
strtok(). This works for me.
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 16:10
Message:
I now reproduced the error. It only occurs when documents are indexed. It
does not occurwhen no documents are indexed. It think an inserted string in
a [str,str] bat is modified in the "C" program AFTER the string is inserted
in the BAT. I will try to fix it by modifying a copy of the whitelist
value.
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 15:54
Message:
Strange bug! When I do in the HEAD:
tijah:create-ft-index(("S1","S2","S3"),<TijahOptions ft-index="TRY"
stemmer="por
ter-english" whitelist="invention-title panel tag1 tag2 tag3"/>)
And then I print the bat you printed I get a good result.
MonetDB>bat("tj_TRY_param").print();
#-----------------------------------------------------------------#
# h t # name
# str str # type
#-----------------------------------------------------------------#
[ "stemmer", "porter-english" ]
[ "fragmentSize", "1073741823" ]
[ "curFragment", "0" ]
[ "preExpansion", "4" ]
[ "lastStopWord", "0" ]
[ "_version", "1.1" ]
[ "name", "TRY" ]
[ "_last_tijahPre", "1" ]
[ "tokenizer", "flex" ]
[ "delay_finalize", "0" ]
[ "whitelist", "invention-title panel tag1 tag2 tag3" ]
[ "collectionSize", nil ]
[ "status", "finalized" ]
[ "_last_finalizedPre", "0" ]
MonetDB>
I wil try to reproduce the error another way but I'm a litle bit stymied
because I do not change the whitelist parameter value during creation.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2929885&group_id=56967
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs