any ideas on what might be causing the memory usages?

Billy

"Billy" <[EMAIL PROTECTED]> wrote in 
message news:[EMAIL PROTECTED]
>I tested it with out scanners and just inserting data casue the memory 
>useage to rise and never recover from what I seen.
>
> I submitted a job that download web pages striped out the data needed and 
> inserted it in to the table via php to REST. i used a trext file for the 
> input so no reads to the table. table splits where never more then 15 so 
> cached meta if any should not be the problem.
>
> a copy of the insert php function code I use to insert data is below
>
> But basicly I open a socket connection to the REST interface
> send send this:
>
> fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n");
> fputs( $fp, "Host: ".$master_ip."\r\n");
> fputs( $fp, "Content-type: text/xml\r\n" );
> fputs( $fp, "Content-length: ".strlen($xml)."\r\n");
> fputs( $fp, "Connection: close\r\n\r\n");
> fputs( $fp, $xml."\r\n\r\n");
>
> Then get the returning data from the socket and close the connection.
>
>
> The REST interface starts out with about 38MB used in memory then climbs. 
> I have not let it crash with a copy running out side of the master but it 
> did run out of memory while using the masters REST Interface. It takes a 
> lot of transactions to use up 1gb of memory I checked each table server 
> and the sum of the edit ids on a new install was about 51 million using 
> just the master as the interface.
>
> This causes me to have to kill and restart the proc to recover memory from 
> time to time to keep it from crashing.
> Over all speed remains the same for transactions sec and I do not see any 
> other problems that I can tell.
>
> {PHP Code Start}
> <?
> function hbase_insert($master_ip,$table,$row,$col,$data){
> //echo "row:".$row." - col:".$col." - data:".$data."\n";
> // make all arrays
> if (!is_array($col)) {
>  $column[0] = $col;
> } else {
>  $column = $col;
> } // end if
> unset($col);
> if (!is_array($data)){
>  $adata[0] = $data;
> } else {
>  $adata = $data;
> } // end if
> unset($data);
> // loop col array building xml to submit
> $xml = '<?xml version="1.0" encoding="UTF-8"?><row>';
> for ($count=count($column), $zz=0; $zz<$count; $zz++){
>  //make sure the col has a : on the end if its not a child
>  if (!ereg(":",$column[$zz])){
>   $column[$zz] = $column[$zz].":";
>  } // end if
>  //append each column to the xml filed
>  $xml .= 
> '<column><name>'.$column[$zz].'</name><value>'.base64_encode($adata[$zz]).'</value></column>';
> } // end for
> $xml .= '</row>';
> //echo $xml,"\n";
> $fp = hbase_connect($master_ip);
> if (!$fp){
>  return "failed";
> } // endif
> fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n");
> fputs( $fp, "Host: ".$master_ip."\r\n");
> fputs( $fp, "Content-type: text/xml\r\n" );
> fputs( $fp, "Content-length: ".strlen($xml)."\r\n");
> fputs( $fp, "Connection: close\r\n\r\n");
> fputs( $fp, $xml."\r\n\r\n");
>
> //loop through the response from the server
> $buff = "";
> while(!feof($fp)){
>  $buff .= fgets($fp, 1024);
> } // end while
> fclose($fp);
> if (!ereg("HTTP/1.1 200 OK",$buff)){
>  return $buff;
> } else {
>  return "success";
> } // end if
> } // end function
>
> function hbase_connect($master_ip){
> $fp = fsockopen("127.0.0.1", "60050", $errno, $errstr, $timeout = 10);
> if ($fp){
>  echo "Localhost\n";
>  return $fp;
> } else {
>  $fp = fsockopen($master_ip, "60010", $errno, $errstr, $timeout = 10);
>  if ($fp){
>   echo "Master\n";
>   return $fp;
>  } else {
>   return -1;
>  } // end if
> } // end if
> } // end function
>
> ?>
> {PHP Code End}
>
> "Bryan Duxbury" <[EMAIL PROTECTED]> wrote in 
> message 
> news:[EMAIL PROTECTED]
>> Are you closing the scanners when you're done? If not, those might be 
>> hanging around for a long time. I don't think we've built in the  proper 
>> timeout logic to make that work by itself.
>>
>> -Bryan
>>
>> On Dec 21, 2007, at 5:10 PM, Billy wrote:
>>
>>> I was thanking the same thing and been running REST outside of the 
>>> Master on
>>> each server for about 5 hours now and used the master as a backup  if 
>>> local
>>> rest interface failed. You are right I seen a little faster  processing 
>>> time
>>> from doing this vs. using just the master.
>>>
>>> Seams the problem is not with the master its self looks like REST  is 
>>> using
>>> up more and more memory not sure but I thank its to do with inserts 
>>> maybe
>>> not but the memory usage is going up I an doing a scanner 2 threads 
>>> reading
>>> rows and processing the data and inserting it in to a separate table
>>> building a inverted index.
>>>
>>> I will restart everything when this job is done and try to do just 
>>> inserts
>>> and see if its the scanner or inserts.
>>>
>>> The master is holding at about 75mb and the rest interfaces are up  to 
>>> 400MB
>>> and slowly going up on the ones running the jobs.
>>>
>>> I am still testing I will see what else I can come up with.
>>>
>>> Billy
>>>
>>>
>>> "stack" <[EMAIL PROTECTED]> wrote in message
>>> news:[EMAIL PROTECTED]
>>>> Hey Billy:
>>>>
>>>> Master itself should use little memory and though it is not out of  the
>>>> realm of possibiliites, it should not have a leak.
>>>>
>>>> Are you running with the default heap size?  You might want to  give it
>>>> more memory if you are (See
>>>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#3 for how).
>>>>
>>>> If you are uploading all via the REST server running on the  master, 
>>>> the
>>>> problem as you speculate, could be in the REST servlet itself  (though 
>>>> it
>>>> looks like it shouldn't be holding on to anything having given it a
>>>> cursory glance).  You could try running the REST server  independent of 
>>>> the
>>>> master.  Grep for 'Starting the REST Server' in this page,
>>>> http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest, for how (If  you 
>>>> are
>>>> only running one REST instance, your upload might go faster if you  run
>>>> multiple).
>>>>
>>>> St.Ack
>>>>
>>>>
>>>> Billy wrote:
>>>>> I forgot to say that once restart the master only uses about 70mb of
>>>>> memory
>>>>>
>>>>> Billy
>>>>>
>>>>> "Billy" <[EMAIL PROTECTED]> 
>>>>> wrote
>>>>> in message news:[EMAIL PROTECTED]
>>>>>
>>>>>> I not sure of this but why does the master server use up so much 
>>>>>> memory.
>>>>>> I been running an script that been inserting data into a table  for a
>>>>>> little over 24 hours and the master crashed because of
>>>>>> java.lang.OutOfMemoryError: Java heap space.
>>>>>>
>>>>>> So my question is why does the master use up so much memory at  most 
>>>>>> it
>>>>>> should store the -ROOT-,.META. tables in memory and block to table
>>>>>> mapping.
>>>>>>
>>>>>> Is it cache or a memory leak?
>>>>>>
>>>>>> I am using the rest interface so could that be the reason?
>>>>>>
>>>>>> I inserted according to the high edit ids on all the region servers
>>>>>> about
>>>>>> 51,932,760 edits and the master ran out of memory with a heap of 
>>>>>> about
>>>>>> 1GB.
>>>>>>
>>>>>> The other side to this is the data I inserted is only taking up 
>>>>>> 886.61
>>>>>> MB and that's with
>>>>>> dfs.replication set to 2 so half that is only 440MB of data 
>>>>>> compressed
>>>>>> at the block level.
>>>>>> From what I understand the master should have lower memory and  cpu 
>>>>>> usage
>>>>>> and the namenode on hadoop should be the memory hog it has to  keep 
>>>>>> up
>>>>>> with all the data about the blocks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
> 



Reply via email to