Re: is it possible to index wiki markup files?

Reyna Melara Wed, 11 Jan 2012 19:43:20 -0800

Thanks to all that have done a reply to my question.

Send regards,


Reyna

2012/1/11 Michael Wechner <[email protected]>

> Maybe Tika is also of help to you
>
> http://tika.apache.org/
>
> HTH
>
> Michael
>
> Am 11.01.12 20:13, schrieb Reyna Melara:
>
>> Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a
>> set
>> of 11,051,447 files with txt extension but the content of each file is in
>> fact in wiki format, I want and I need them to be indexed, but I don't
>> know
>> if I have to convert this content to flat text, I have been reading and I
>> have found that:
>>
>> "At the core of Lucene's logical architecture is the idea of a *document*
>>  containing *fields* of text. This flexibility allows Lucene's API to be
>>
>> independent of the file 
>> format<http://en.wikipedia.**org/wiki/File_format<http://en.wikipedia.org/wiki/File_format>
>> >.
>> Text from 
>> PDFs<http://en.wikipedia.org/**wiki/Portable_Document_Format<http://en.wikipedia.org/wiki/Portable_Document_Format>
>> >**,
>> HTML<http://en.wikipedia.org/**wiki/HTML<http://en.wikipedia.org/wiki/HTML>
>> >
>> , Microsoft 
>> Word<http://en.wikipedia.org/**wiki/Microsoft_Word<http://en.wikipedia.org/wiki/Microsoft_Word>>,
>> and
>> OpenDocument<http://en.**wikipedia.org/wiki/**OpenDocument<http://en.wikipedia.org/wiki/OpenDocument>>
>>  documents, as well
>>
>> as many others (except images), can all be indexed as long as their
>> textual
>> information can be extracted."
>>
>> So, I guess there's no problem if I leave the files just like they are
>> already.
>>
>> My question about would be: Do I get the same results and advantages of
>> this files? Will it be good?
>>
>> Thanks a lot, send best regards.
>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org<[email protected]>
> For additional commands, e-mail: 
> [email protected].**org<[email protected]>
>
>


-- 
Reyna

Re: is it possible to index wiki markup files?

Reply via email to