Hi,

William Lachance wrote:
On 3/13/06, John Steele Scott <[EMAIL PROTECTED]> wrote:
I do like the idea of making a filter which supports hot words, metadata
and snippets, but alas, the weekend is over. Maybe next time I have some
hacking time I'll have another look.
You could definitely output the metadata and hot words as part of a
text stream, so Fridrich's approach would work here too. See the
'wpd2raw' utility in libwpd (which we use in our regression testing
framework) for an example.

I just added to the wpd2text an "--info" switch. "wpd2text --info <infile> outputs the document's metadata (one entry per line in label-space-content format) instead of the document itself. Without the switch the wpd2text behaves as before.

I know that Nat Friedman was demonstrating the classification of relevance in odt documents using the "hot words" and it can make many people enthusiatic. Nevertheless (and the explanation would be lengthy, so will give only on request), I highly doubt that a word's text attribute in a WP document is the best indicator of its relevance. At this stage of conversion at least. IMHO, wpd2text should do text ;-) and only text. But, this is my own view and I do not have problem to be convinced about the contrary by sound arguments :-)

On other note, whenever I have a time, I am trying to work on headers/footers in WP3 and WP5 documents. For the while, enabling them is making my WP{3,5}ContentListener crash on the first headers in "subDocument->parse(this)" in _handleSubDocument... A problem with virtual calls probably, but will investigate more. After that, I would like to handle in a bit correct way tabulators in these parsers and eventually release 0.8.5, but this is not yet part of the immediate future.

Cheers

Fridrich


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Libwpd-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/libwpd-devel

Reply via email to