any ideas on what might be causing the memory usages? Billy
"Billy" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I tested it with out scanners and just inserting data casue the memory >useage to rise and never recover from what I seen. > > I submitted a job that download web pages striped out the data needed and > inserted it in to the table via php to REST. i used a trext file for the > input so no reads to the table. table splits where never more then 15 so > cached meta if any should not be the problem. > > a copy of the insert php function code I use to insert data is below > > But basicly I open a socket connection to the REST interface > send send this: > > fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n"); > fputs( $fp, "Host: ".$master_ip."\r\n"); > fputs( $fp, "Content-type: text/xml\r\n" ); > fputs( $fp, "Content-length: ".strlen($xml)."\r\n"); > fputs( $fp, "Connection: close\r\n\r\n"); > fputs( $fp, $xml."\r\n\r\n"); > > Then get the returning data from the socket and close the connection. > > > The REST interface starts out with about 38MB used in memory then climbs. > I have not let it crash with a copy running out side of the master but it > did run out of memory while using the masters REST Interface. It takes a > lot of transactions to use up 1gb of memory I checked each table server > and the sum of the edit ids on a new install was about 51 million using > just the master as the interface. > > This causes me to have to kill and restart the proc to recover memory from > time to time to keep it from crashing. > Over all speed remains the same for transactions sec and I do not see any > other problems that I can tell. > > {PHP Code Start} > <? > function hbase_insert($master_ip,$table,$row,$col,$data){ > //echo "row:".$row." - col:".$col." - data:".$data."\n"; > // make all arrays > if (!is_array($col)) { > $column[0] = $col; > } else { > $column = $col; > } // end if > unset($col); > if (!is_array($data)){ > $adata[0] = $data; > } else { > $adata = $data; > } // end if > unset($data); > // loop col array building xml to submit > $xml = '<?xml version="1.0" encoding="UTF-8"?><row>'; > for ($count=count($column), $zz=0; $zz<$count; $zz++){ > //make sure the col has a : on the end if its not a child > if (!ereg(":",$column[$zz])){ > $column[$zz] = $column[$zz].":"; > } // end if > //append each column to the xml filed > $xml .= > '<column><name>'.$column[$zz].'</name><value>'.base64_encode($adata[$zz]).'</value></column>'; > } // end for > $xml .= '</row>'; > //echo $xml,"\n"; > $fp = hbase_connect($master_ip); > if (!$fp){ > return "failed"; > } // endif > fputs( $fp, "PUT /api/".$table."/row/".$row."/ HTTP/1.1\r\n"); > fputs( $fp, "Host: ".$master_ip."\r\n"); > fputs( $fp, "Content-type: text/xml\r\n" ); > fputs( $fp, "Content-length: ".strlen($xml)."\r\n"); > fputs( $fp, "Connection: close\r\n\r\n"); > fputs( $fp, $xml."\r\n\r\n"); > > //loop through the response from the server > $buff = ""; > while(!feof($fp)){ > $buff .= fgets($fp, 1024); > } // end while > fclose($fp); > if (!ereg("HTTP/1.1 200 OK",$buff)){ > return $buff; > } else { > return "success"; > } // end if > } // end function > > function hbase_connect($master_ip){ > $fp = fsockopen("127.0.0.1", "60050", $errno, $errstr, $timeout = 10); > if ($fp){ > echo "Localhost\n"; > return $fp; > } else { > $fp = fsockopen($master_ip, "60010", $errno, $errstr, $timeout = 10); > if ($fp){ > echo "Master\n"; > return $fp; > } else { > return -1; > } // end if > } // end if > } // end function > > ?> > {PHP Code End} > > "Bryan Duxbury" <[EMAIL PROTECTED]> wrote in > message > news:[EMAIL PROTECTED] >> Are you closing the scanners when you're done? If not, those might be >> hanging around for a long time. I don't think we've built in the proper >> timeout logic to make that work by itself. >> >> -Bryan >> >> On Dec 21, 2007, at 5:10 PM, Billy wrote: >> >>> I was thanking the same thing and been running REST outside of the >>> Master on >>> each server for about 5 hours now and used the master as a backup if >>> local >>> rest interface failed. You are right I seen a little faster processing >>> time >>> from doing this vs. using just the master. >>> >>> Seams the problem is not with the master its self looks like REST is >>> using >>> up more and more memory not sure but I thank its to do with inserts >>> maybe >>> not but the memory usage is going up I an doing a scanner 2 threads >>> reading >>> rows and processing the data and inserting it in to a separate table >>> building a inverted index. >>> >>> I will restart everything when this job is done and try to do just >>> inserts >>> and see if its the scanner or inserts. >>> >>> The master is holding at about 75mb and the rest interfaces are up to >>> 400MB >>> and slowly going up on the ones running the jobs. >>> >>> I am still testing I will see what else I can come up with. >>> >>> Billy >>> >>> >>> "stack" <[EMAIL PROTECTED]> wrote in message >>> news:[EMAIL PROTECTED] >>>> Hey Billy: >>>> >>>> Master itself should use little memory and though it is not out of the >>>> realm of possibiliites, it should not have a leak. >>>> >>>> Are you running with the default heap size? You might want to give it >>>> more memory if you are (See >>>> http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#3 for how). >>>> >>>> If you are uploading all via the REST server running on the master, >>>> the >>>> problem as you speculate, could be in the REST servlet itself (though >>>> it >>>> looks like it shouldn't be holding on to anything having given it a >>>> cursory glance). You could try running the REST server independent of >>>> the >>>> master. Grep for 'Starting the REST Server' in this page, >>>> http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest, for how (If you >>>> are >>>> only running one REST instance, your upload might go faster if you run >>>> multiple). >>>> >>>> St.Ack >>>> >>>> >>>> Billy wrote: >>>>> I forgot to say that once restart the master only uses about 70mb of >>>>> memory >>>>> >>>>> Billy >>>>> >>>>> "Billy" <[EMAIL PROTECTED]> >>>>> wrote >>>>> in message news:[EMAIL PROTECTED] >>>>> >>>>>> I not sure of this but why does the master server use up so much >>>>>> memory. >>>>>> I been running an script that been inserting data into a table for a >>>>>> little over 24 hours and the master crashed because of >>>>>> java.lang.OutOfMemoryError: Java heap space. >>>>>> >>>>>> So my question is why does the master use up so much memory at most >>>>>> it >>>>>> should store the -ROOT-,.META. tables in memory and block to table >>>>>> mapping. >>>>>> >>>>>> Is it cache or a memory leak? >>>>>> >>>>>> I am using the rest interface so could that be the reason? >>>>>> >>>>>> I inserted according to the high edit ids on all the region servers >>>>>> about >>>>>> 51,932,760 edits and the master ran out of memory with a heap of >>>>>> about >>>>>> 1GB. >>>>>> >>>>>> The other side to this is the data I inserted is only taking up >>>>>> 886.61 >>>>>> MB and that's with >>>>>> dfs.replication set to 2 so half that is only 440MB of data >>>>>> compressed >>>>>> at the block level. >>>>>> From what I understand the master should have lower memory and cpu >>>>>> usage >>>>>> and the namenode on hadoop should be the memory hog it has to keep >>>>>> up >>>>>> with all the data about the blocks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > >