**Hallo, the only problem I see in programming mapreduce and hbase is that I can write one program in an infinite ways, and an hbase in several structures. At the end each way has a different performance then the other!!! And since I am a beginner I do not get the cause on a fly.
Some more details on my application, and questions perturbing me are presented here: *I-Settings:* Cluster of 6 nodes runing hadoop-0.19.1 and hbase-0.19.3 *II-My application* I have an hbase table called "myTable" storing sensor inforamtion. The exact stored information is the reading value of a sensor, the timestamp of the reading, the longitude and latitude of the sensor node and the id of the sensornode. Assuming the table is very very large in the order of TB. *II.1design* I thought designing the tale with a single column family "cf" and the following columns "cf:Value"=containing the value of the reading "cf:Type"=temperature, moisture,... "cf:TimeStamp"=time at which the reading has been taken "cf:Latitude"=latitude of the sensor node "cf:Longitude"=the longitude of the sensor node "cf:SensorNode"=ID of the sensor node. I did not think about any special key structure!!! I guess this is bad or?!! *II.2 my query:* I would like to have different type of queries for example: Q1-Calculating the average of the temperature readings in a Geographical location G1 .where G1:(start latitude:l1, end latitude:l2) ;(start longitude:L1;end longitude:L2) Q2-Listing the node ID that have measured a temperature >30 degrees between time1 and time2. *My questions, assuming writing a mapreducer for query Q1: **-*I have considered passing the "RowResult" as a value to my mapper, the same what you have done in RowCounterMapper. Then I started scanning the table row by row, taking out the reading values of the matching rows (verifying the reading type and tthe geographical location G1). My reducer is simple, it is getting the list of these values and averaging them. With this method, the table was read NxN times where N is the number of rows!! and the performance was extreemly slow of course!!! -How can I pass the Rowresult to my Mapper but still read the table only once or enhance the performance? I have already asked this question before but the answer was not clear for me!! Therefore I started trying another method skipping passing the rowresult to the mapper :) -Im my second method, explained in the previous email. I have tried not to pass the RowResult to the mapper but just pass anything which I do not use like a dummy file. And in the map function, I still had the scanner on the table. In this case the table was scanned only once and the performance was much better than passing the Rowresult. What does this mean, if I pass to the mapper a key and a value which I do not use at all?? Except that thi is not the concept of mapreduce, why is this bad? you do not agree with me that it is a solution right? -Does the design of my table make sense?could you propose a better way to do it? -Any suggestion if it is better to use some key format? how can it help enhancing the performance? Thank you, CJ
