I tested it with out scanners and just inserting data casue the memory 
useage to rise and never recover from what I seen.

I submitted a job that download web pages striped out the data needed and 
inserted it in to the table via php to REST. i used a trext file for the 
input so no reads to the table. table splits where never more then 15 so 
cached meta if any should not be the problem.

a copy of the insert php function code I use to insert data is below

But basicly I open a socket connection to the REST interface
send send this:

fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n");
fputs( $fp, "Host: ".$master_ip."\r\n");
fputs( $fp, "Content-type: text/xml\r\n" );
fputs( $fp, "Content-length: ".strlen($xml)."\r\n");
fputs( $fp, "Connection: close\r\n\r\n");
fputs( $fp, $xml."\r\n\r\n");

Then get the returning data from the socket and close the connection.


The REST interface starts out with about 38MB used in memory then climbs. I 
have not let it crash with a copy running out side of the master but it did 
run out of memory while using the masters REST Interface. It takes a lot of 
transactions to use up 1gb of memory I checked each table server and the sum 
of the edit ids on a new install was about 51 million using just the master 
as the interface.

This causes me to have to kill and restart the proc to recover memory from 
time to time to keep it from crashing.
Over all speed remains the same for transactions sec and I do not see any 
other problems that I can tell.

{PHP Code Start}
<?
function hbase_insert($master_ip,$table,$row,$col,$data){
 //echo "row:".$row." - col:".$col." - data:".$data."\n";
 // make all arrays
 if (!is_array($col)) {
  $column[0] = $col;
 } else {
  $column = $col;
 } // end if
 unset($col);
 if (!is_array($data)){
  $adata[0] = $data;
 } else {
  $adata = $data;
 } // end if
 unset($data);
 // loop col array building xml to submit
 $xml = '<?xml version="1.0" encoding="UTF-8"?><row>';
 for ($count=count($column), $zz=0; $zz<$count; $zz++){
  //make sure the col has a : on the end if its not a child
  if (!ereg(":",$column[$zz])){
   $column[$zz] = $column[$zz].":";
  } // end if
  //append each column to the xml filed
  $xml .= 
'<column><name>'.$column[$zz].'</name><value>'.base64_encode($adata[$zz]).'</value></column>';
 } // end for
 $xml .= '</row>';
 //echo $xml,"\n";
 $fp = hbase_connect($master_ip);
 if (!$fp){
  return "failed";
 } // endif
 fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n");
 fputs( $fp, "Host: ".$master_ip."\r\n");
 fputs( $fp, "Content-type: text/xml\r\n" );
 fputs( $fp, "Content-length: ".strlen($xml)."\r\n");
 fputs( $fp, "Connection: close\r\n\r\n");
 fputs( $fp, $xml."\r\n\r\n");

 //loop through the response from the server
 $buff = "";
 while(!feof($fp)){
  $buff .= fgets($fp, 1024);
 } // end while
 fclose($fp);
 if (!ereg("HTTP/1.1 200 OK",$buff)){
  return $buff;
 } else {
  return "success";
 } // end if
} // end function

function hbase_connect($master_ip){
 $fp = fsockopen("127.0.0.1", "60050", $errno, $errstr, $timeout = 10);
 if ($fp){
  echo "Localhost\n";
  return $fp;
 } else {
  $fp = fsockopen($master_ip, "60010", $errno, $errstr, $timeout = 10);
  if ($fp){
   echo "Master\n";
   return $fp;
  } else {
   return -1;
  } // end if
 } // end if
} // end function

?>
{PHP Code End}

"Bryan Duxbury" <[EMAIL PROTECTED]> wrote in 
message news:[EMAIL PROTECTED]
> Are you closing the scanners when you're done? If not, those might be 
> hanging around for a long time. I don't think we've built in the  proper 
> timeout logic to make that work by itself.
>
> -Bryan
>
> On Dec 21, 2007, at 5:10 PM, Billy wrote:
>
>> I was thanking the same thing and been running REST outside of the 
>> Master on
>> each server for about 5 hours now and used the master as a backup  if 
>> local
>> rest interface failed. You are right I seen a little faster  processing 
>> time
>> from doing this vs. using just the master.
>>
>> Seams the problem is not with the master its self looks like REST  is 
>> using
>> up more and more memory not sure but I thank its to do with inserts 
>> maybe
>> not but the memory usage is going up I an doing a scanner 2 threads 
>> reading
>> rows and processing the data and inserting it in to a separate table
>> building a inverted index.
>>
>> I will restart everything when this job is done and try to do just 
>> inserts
>> and see if its the scanner or inserts.
>>
>> The master is holding at about 75mb and the rest interfaces are up  to 
>> 400MB
>> and slowly going up on the ones running the jobs.
>>
>> I am still testing I will see what else I can come up with.
>>
>> Billy
>>
>>
>> "stack" <[EMAIL PROTECTED]> wrote in message
>> news:[EMAIL PROTECTED]
>>> Hey Billy:
>>>
>>> Master itself should use little memory and though it is not out of  the
>>> realm of possibiliites, it should not have a leak.
>>>
>>> Are you running with the default heap size?  You might want to  give it
>>> more memory if you are (See
>>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#3 for how).
>>>
>>> If you are uploading all via the REST server running on the  master, the
>>> problem as you speculate, could be in the REST servlet itself  (though 
>>> it
>>> looks like it shouldn't be holding on to anything having given it a
>>> cursory glance).  You could try running the REST server  independent of 
>>> the
>>> master.  Grep for 'Starting the REST Server' in this page,
>>> http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest, for how (If  you 
>>> are
>>> only running one REST instance, your upload might go faster if you  run
>>> multiple).
>>>
>>> St.Ack
>>>
>>>
>>> Billy wrote:
>>>> I forgot to say that once restart the master only uses about 70mb of
>>>> memory
>>>>
>>>> Billy
>>>>
>>>> "Billy" <[EMAIL PROTECTED]> wrote
>>>> in message news:[EMAIL PROTECTED]
>>>>
>>>>> I not sure of this but why does the master server use up so much 
>>>>> memory.
>>>>> I been running an script that been inserting data into a table  for a
>>>>> little over 24 hours and the master crashed because of
>>>>> java.lang.OutOfMemoryError: Java heap space.
>>>>>
>>>>> So my question is why does the master use up so much memory at  most 
>>>>> it
>>>>> should store the -ROOT-,.META. tables in memory and block to table
>>>>> mapping.
>>>>>
>>>>> Is it cache or a memory leak?
>>>>>
>>>>> I am using the rest interface so could that be the reason?
>>>>>
>>>>> I inserted according to the high edit ids on all the region servers
>>>>> about
>>>>> 51,932,760 edits and the master ran out of memory with a heap of 
>>>>> about
>>>>> 1GB.
>>>>>
>>>>> The other side to this is the data I inserted is only taking up 
>>>>> 886.61
>>>>> MB and that's with
>>>>> dfs.replication set to 2 so half that is only 440MB of data 
>>>>> compressed
>>>>> at the block level.
>>>>> From what I understand the master should have lower memory and  cpu 
>>>>> usage
>>>>> and the namenode on hadoop should be the memory hog it has to  keep up
>>>>> with all the data about the blocks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
> 



Reply via email to