I tested it with out scanners and just inserting data casue the memory useage to rise and never recover from what I seen.
I submitted a job that download web pages striped out the data needed and inserted it in to the table via php to REST. i used a trext file for the input so no reads to the table. table splits where never more then 15 so cached meta if any should not be the problem. a copy of the insert php function code I use to insert data is below But basicly I open a socket connection to the REST interface send send this: fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n"); fputs( $fp, "Host: ".$master_ip."\r\n"); fputs( $fp, "Content-type: text/xml\r\n" ); fputs( $fp, "Content-length: ".strlen($xml)."\r\n"); fputs( $fp, "Connection: close\r\n\r\n"); fputs( $fp, $xml."\r\n\r\n"); Then get the returning data from the socket and close the connection. The REST interface starts out with about 38MB used in memory then climbs. I have not let it crash with a copy running out side of the master but it did run out of memory while using the masters REST Interface. It takes a lot of transactions to use up 1gb of memory I checked each table server and the sum of the edit ids on a new install was about 51 million using just the master as the interface. This causes me to have to kill and restart the proc to recover memory from time to time to keep it from crashing. Over all speed remains the same for transactions sec and I do not see any other problems that I can tell. {PHP Code Start} <? function hbase_insert($master_ip,$table,$row,$col,$data){ //echo "row:".$row." - col:".$col." - data:".$data."\n"; // make all arrays if (!is_array($col)) { $column[0] = $col; } else { $column = $col; } // end if unset($col); if (!is_array($data)){ $adata[0] = $data; } else { $adata = $data; } // end if unset($data); // loop col array building xml to submit $xml = '<?xml version="1.0" encoding="UTF-8"?><row>'; for ($count=count($column), $zz=0; $zz<$count; $zz++){ //make sure the col has a : on the end if its not a child if (!ereg(":",$column[$zz])){ $column[$zz] = $column[$zz].":"; } // end if //append each column to the xml filed $xml .= '<column><name>'.$column[$zz].'</name><value>'.base64_encode($adata[$zz]).'</value></column>'; } // end for $xml .= '</row>'; //echo $xml,"\n"; $fp = hbase_connect($master_ip); if (!$fp){ return "failed"; } // endif fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n"); fputs( $fp, "Host: ".$master_ip."\r\n"); fputs( $fp, "Content-type: text/xml\r\n" ); fputs( $fp, "Content-length: ".strlen($xml)."\r\n"); fputs( $fp, "Connection: close\r\n\r\n"); fputs( $fp, $xml."\r\n\r\n"); //loop through the response from the server $buff = ""; while(!feof($fp)){ $buff .= fgets($fp, 1024); } // end while fclose($fp); if (!ereg("HTTP/1.1 200 OK",$buff)){ return $buff; } else { return "success"; } // end if } // end function function hbase_connect($master_ip){ $fp = fsockopen("127.0.0.1", "60050", $errno, $errstr, $timeout = 10); if ($fp){ echo "Localhost\n"; return $fp; } else { $fp = fsockopen($master_ip, "60010", $errno, $errstr, $timeout = 10); if ($fp){ echo "Master\n"; return $fp; } else { return -1; } // end if } // end if } // end function ?> {PHP Code End} "Bryan Duxbury" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Are you closing the scanners when you're done? If not, those might be > hanging around for a long time. I don't think we've built in the proper > timeout logic to make that work by itself. > > -Bryan > > On Dec 21, 2007, at 5:10 PM, Billy wrote: > >> I was thanking the same thing and been running REST outside of the >> Master on >> each server for about 5 hours now and used the master as a backup if >> local >> rest interface failed. You are right I seen a little faster processing >> time >> from doing this vs. using just the master. >> >> Seams the problem is not with the master its self looks like REST is >> using >> up more and more memory not sure but I thank its to do with inserts >> maybe >> not but the memory usage is going up I an doing a scanner 2 threads >> reading >> rows and processing the data and inserting it in to a separate table >> building a inverted index. >> >> I will restart everything when this job is done and try to do just >> inserts >> and see if its the scanner or inserts. >> >> The master is holding at about 75mb and the rest interfaces are up to >> 400MB >> and slowly going up on the ones running the jobs. >> >> I am still testing I will see what else I can come up with. >> >> Billy >> >> >> "stack" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >>> Hey Billy: >>> >>> Master itself should use little memory and though it is not out of the >>> realm of possibiliites, it should not have a leak. >>> >>> Are you running with the default heap size? You might want to give it >>> more memory if you are (See >>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#3 for how). >>> >>> If you are uploading all via the REST server running on the master, the >>> problem as you speculate, could be in the REST servlet itself (though >>> it >>> looks like it shouldn't be holding on to anything having given it a >>> cursory glance). You could try running the REST server independent of >>> the >>> master. Grep for 'Starting the REST Server' in this page, >>> http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest, for how (If you >>> are >>> only running one REST instance, your upload might go faster if you run >>> multiple). >>> >>> St.Ack >>> >>> >>> Billy wrote: >>>> I forgot to say that once restart the master only uses about 70mb of >>>> memory >>>> >>>> Billy >>>> >>>> "Billy" <[EMAIL PROTECTED]> wrote >>>> in message news:[EMAIL PROTECTED] >>>> >>>>> I not sure of this but why does the master server use up so much >>>>> memory. >>>>> I been running an script that been inserting data into a table for a >>>>> little over 24 hours and the master crashed because of >>>>> java.lang.OutOfMemoryError: Java heap space. >>>>> >>>>> So my question is why does the master use up so much memory at most >>>>> it >>>>> should store the -ROOT-,.META. tables in memory and block to table >>>>> mapping. >>>>> >>>>> Is it cache or a memory leak? >>>>> >>>>> I am using the rest interface so could that be the reason? >>>>> >>>>> I inserted according to the high edit ids on all the region servers >>>>> about >>>>> 51,932,760 edits and the master ran out of memory with a heap of >>>>> about >>>>> 1GB. >>>>> >>>>> The other side to this is the data I inserted is only taking up >>>>> 886.61 >>>>> MB and that's with >>>>> dfs.replication set to 2 so half that is only 440MB of data >>>>> compressed >>>>> at the block level. >>>>> From what I understand the master should have lower memory and cpu >>>>> usage >>>>> and the namenode on hadoop should be the memory hog it has to keep up >>>>> with all the data about the blocks. >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> > >