Hi, First up I would like to say I’m really excited by the Blur project, it seems to fit the need of a potential project perfectly. I’m hoping that I can someday contribute back to this project in some way as it seems that it will be of enormous help to me.
Now, on to the meat of the issue. I’m a complete search newbie. I am coming from a Spring/Application development background but have to get involved in the Search/Big data field for a current client. Since the new year I have been looking at Hadoop and have setup a small cluster using Cloudera’s excellent tools. I’ve been downloading datasets, running MR jobs, etc. and think I have gleaned a very basic level of knowledge which is enough for me to learn more when I need it. This week I have started looking at Blur, and at present I have cloned the src to the hadoop namenode where I have built and started the blur servers. But now I am stuck, and don’t know where to go. So I will ask the following 1 - /apache-blur-0.2.0-SNAPSHOT/conf/servers. At present I just have my namenode defined in here. Do I need to add my datanodes as well? 2 - blur> create repl-table hdfs://localhost:9000/blur/repl-table 1 java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused. I’m confused here. Is 9000 the correct port? Is there some sort of user auth issue? 3 - Assuming I create a table on the hdfs, when I want to import my data into it I use a MR job yes? What is the best way to package this job? Do I have to include all the Blur jars or do I install Blur on the datanodes and set a classpath? Is it possible to link to an example MR job in a maven project? Or am I on completely the wrong track. Thanks for your help, Paul.
