Bugs item #2929885, was opened at 2010-01-11 17:01
Message generated for change (Settings changed) made by jflokstra
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2929885&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PFtijah
Group: Pathfinder CVS Head
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Roberto Cornacchia (cornuz)
Assigned to: Jan Flokstra (jflokstra)
Summary: PFTIJAH: whitelist in "_param" bat incomplete 

Initial Comment:
I create the pftijah index with a whitelist, as in:

<TijahOptions stemmer="porter-english" whitelist="invention-title abstract 
description claim"/>


The whitelist works well. The tag dictionary contains only the tags I wanted + 
all the tags included.
mil>bat("tj_DFLT_FT_INDEX_tagdict").print();
#---------------------------------#
# h     t                         # name
# void  str                       # type
#---------------------------------#
[ 0...@0,         "_DOCUMENT_ROOT"        ]
[ 1...@0,         "invention-title"       ]
[ 2...@0,         "abstract"              ]
[ 3...@0,         "p"                     ]
[ 4...@0,         "description"           ]
[ 5...@0,         "ul"                    ]
[ 6...@0,         "li"                    ]
[ 7...@0,         "sup"                   ]
[ 8...@0,         "sl"                    ]
[ 9...@0,         "heading"               ]
[ 1...@0,         "u"                     ]
[ 1...@0,         "tables"                ]
[ 1...@0,         "table"                 ]
[ 1...@0,         "tgroup"                ]
[ 1...@0,         "tbody"                 ]
[ 1...@0,         "row"                   ]
[ 1...@0,         "entry"                 ]
[ 1...@0,         "sub"                   ]
[ 1...@0,         "img"                   ]
[ 1...@0,         "b"                     ]
[ 2...@0,         "claim"                 ]
[ 2...@0,         "claim-text"            ]


However, the "tj_DFLT_FT_INDEX_param" bat contains only the FIRST tag specified 
in my whitelist:

mil>bat("tj_DFLT_FT_INDEX_param").print();;
#-------------------------------------------------#
# h                     t                         # name
# str                   str                       # type
#-------------------------------------------------#
[ "curFragment",          "0"                     ]
[ "preExpansion",         "4"                     ]
[ "_version",             "1.1"                   ]
[ "stemmer",              "porter-english"        ]
[ "tokenizer",            "flex"                  ]
[ "name",                 "DFLT_FT_INDEX"         ]
[ "fragmentSize",         "1073741823"            ]
[ "delay_finalize",       "0"                     ]
[ "whitelist",            "invention-title"       ]
[ "lastStopWord",         "430"                   ]
[ "_last_tijahPre",       "7662"                  ]
[ "status",               "finalized"             ]
[ "collectionSize",       "7055"                  ]
[ "_last_finalizedPre",   "7661"                  ]


----------------------------------------------------------------------

Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 16:26

Message:
I fixed it by making a copy of the value before tokenizing it with
strtok(). This works for me.

----------------------------------------------------------------------

Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 16:10

Message:
I now reproduced the error. It only occurs when documents are indexed. It
does not occurwhen no documents are indexed. It think an inserted string in
a [str,str] bat is modified in the "C" program AFTER the string is inserted
in the BAT.  I will try to fix it by modifying a copy of the whitelist
value.

----------------------------------------------------------------------

Comment By: Jan Flokstra (jflokstra)
Date: 2010-02-01 15:54

Message:
Strange bug! When I do in the HEAD:

tijah:create-ft-index(("S1","S2","S3"),<TijahOptions ft-index="TRY"
stemmer="por
ter-english" whitelist="invention-title panel tag1 tag2 tag3"/>)

And then I print the bat you printed I get a good result.

MonetDB>bat("tj_TRY_param").print();
#-----------------------------------------------------------------#
# h                     t                                         # name
# str                   str                                       # type
#-----------------------------------------------------------------#
[ "stemmer",              "porter-english"                        ]
[ "fragmentSize",         "1073741823"                            ]
[ "curFragment",          "0"                                     ]
[ "preExpansion",         "4"                                     ]
[ "lastStopWord",         "0"                                     ]
[ "_version",             "1.1"                                   ]
[ "name",                 "TRY"                                   ]
[ "_last_tijahPre",       "1"                                     ]
[ "tokenizer",            "flex"                                  ]
[ "delay_finalize",       "0"                                     ]
[ "whitelist",            "invention-title panel tag1 tag2 tag3"  ]
[ "collectionSize",       nil                                     ]
[ "status",               "finalized"                             ]
[ "_last_finalizedPre",   "0"                                     ]
MonetDB>                          

I wil try to reproduce the error another way but I'm a litle bit stymied
because I do not change the whitelist parameter value during creation.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2929885&group_id=56967

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to