Welcome to Blur and hadoop! Aaron can fill in more but I think your on the right track. I dont know what the servers file is for as I'm running an older version, If I guessed it would be the list of machines you will run Shards on. You mention a cluster, and you mention having a namenode separated which is good. To roll blur out usually you run a controller (could probably live on the namenode if its a small cluster), and you run a shard server on each datanode/tasktracker node. The HDFS:// path you entered probably should have 8020 on it unless you changed the port in core-site.xml.
The last point is preference really. I roll out jars with the MR jobs and I have added to hadoop-env.sh blur into the HADOOP_CLASSPATH for use. There are other ways this just worked really well for me. I hope that helps, let us know how it works. :) ~Garrett On Wed, Feb 20, 2013 at 1:41 PM, Paul O'Donoghue <[email protected]> wrote: > Hi, > > First up I would like to say I’m really excited by the Blur project, it > seems to fit the need of a potential project perfectly. I’m hoping that I > can someday contribute back to this project in some way as it seems that it > will be of enormous help to me. > > Now, on to the meat of the issue. I’m a complete search newbie. I am coming > from a Spring/Application development background but have to get involved > in the Search/Big data field for a current client. Since the new year I > have been looking at Hadoop and have setup a small cluster using Cloudera’s > excellent tools. I’ve been downloading datasets, running MR jobs, etc. and > think I have gleaned a very basic level of knowledge which is enough for me > to learn more when I need it. This week I have started looking at Blur, and > at present I have cloned the src to the hadoop namenode where I have built > and started the blur servers. But now I am stuck, and don’t know where to > go. So I will ask the following > > 1 - /apache-blur-0.2.0-SNAPSHOT/conf/servers. At present I just have my > namenode defined in here. Do I need to add my datanodes as well? > > 2 - blur> create repl-table hdfs://localhost:9000/blur/repl-table 1 > java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on > connection exception: java.net.ConnectException: Connection refused. > > I’m confused here. Is 9000 the correct port? Is there some sort of user > auth issue? > > 3 - Assuming I create a table on the hdfs, when I want to import my data > into it I use a MR job yes? What is the best way to package this job? Do I > have to include all the Blur jars or do I install Blur on the datanodes and > set a classpath? Is it possible to link to an example MR job in a maven > project? Or am I on completely the wrong track. > > Thanks for your help, > > Paul. >
