Hi everybody,
Sorry if I come again on this issue with this long mail but I really
cant have my plugin loaded.
I have read and applied the suggestion given in various previous
postings on this list
but i still have not get results
Well basically I have used part of the code written for the
Sanjeev,
You have implemented Thai language, right? What else changes you have done
in orignal code ? Do I need to make same changes for say Hindi and Punjabi
Language?
If u bit of time to explain the things to him, will be of great help to
me.
Thank you
./Arun
On 11/8/06, sanjeev
Sorry in my previous posting the output of nutch readseg -get was
wrong .. here is the actual output:
-Corrado
SegmentReader: get 'http://testmachine.test.net/index.html'
Content::
Version: 2
url: http://testmachine.test.net/index.html
base: http://testmachine.test.net/index.html
contentType:
Arun,
I tried implementing thai search for nutch.
I followed the steps outllined in this tutorialfor Chinese:
http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_62153
So sorry - I am not able to help much. How urgent is your requirement ?
Mine is very urgent as I have to get
Sanjay,
I don't think you should follow the Chinese example and extend the CJK
range.
This was needed because Chinese and Japanese don't use space to separate
words. I believe Thai uses spaces, right? If so, you should extend
LETTER
range to include Thai character rather than CJK.
Another place
Regarding Thai, there is a Thai Analyzer in Lucene already:
$ ll contrib/analyzers/src/java/org/apache/lucene/analysis/th/
total 24
drwxrwxr-x 7 otis otis 4096 Oct 27 02:08 .svn/
-rw-rw-r-- 1 otis otis 1528 Jun 5 14:27 ThaiAnalyzer.java
-rw-rw-r-- 1 otis otis 2437 Jun 5 14:27
hi :
i get a problem now ,i can't build the nutch in the linux os with ant
and my ant version is
Apache Ant version 1.5.2-20 compiled on September 25 2003
the error is below
so anyone get the same problem ?i need ur help
Buildfile: build.xml
BUILD FAILED
ok. I downloaded the LuceneInAction code examples from the book and found
there were some
analyzers and tests/demos which included chinese.
But these analyzers were standalone java programs with a main method.
My question is how to integrate into nutch so the index created by crawl
process can
ok Kuro - you are wrong about thai language having spaces between words.
Thai don't have space between words and segmenting thai is a bit tricky
methinks.
Will appreciate any/all help you can give me
cheers,
sanjeev
sanjeev wrote:
ok. I downloaded the LuceneInAction code examples
Arun,
No I haven't come anywhere near the solution. I am myself confused a little.
From what I've learnt - one approach is to use NutchAnalysis.jj and compile
using javacc.
Another is to download dev version of nutch and try to use the patches for
the language analyzer
and identifier.
I failed
10 matches
Mail list logo