[jira] Updated: (HADOOP-2434) MapFile.get on HDFS in TRUNK is WAY!!! slower than 0.15.x

stack (JIRA) Sat, 15 Dec 2007 12:13:04 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HADOOP-2434:
--------------------------

    Attachment: TestMapFile.java

Another version of the attachment.  Just does 100k instead of 1M 'cos 1M takes 
too long (outputs state every 1000 reads folks don't think the program is 
dead).  Here is output from my cluster using first branch-0.15:

{code}
[EMAIL PROTECTED] branch-0.15]$ nohup ./bin/hadoop 
org.apache.hadoop.io.TestMapFile > testmapfile.timings < /dev/null &
[1] 19284
[EMAIL PROTECTED] branch-0.15]$ tail -f testmapfile.timings 
.07/12/15 20:02:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/15 20:02:17 INFO io.TestMapFile: Writing 100000 rows to 
testMapFileRandomAccess
07/12/15 20:02:38 INFO io.TestMapFile: Writing 100000 records took 20975ms
07/12/15 20:02:38 INFO io.TestMapFile: Reading 100000 random rows
07/12/15 20:02:40 INFO io.TestMapFile: Read 1000
07/12/15 20:02:41 INFO io.TestMapFile: Read 2000
07/12/15 20:02:43 INFO io.TestMapFile: Read 3000
07/12/15 20:02:44 INFO io.TestMapFile: Read 4000
07/12/15 20:02:45 INFO io.TestMapFile: Read 5000
07/12/15 20:02:47 INFO io.TestMapFile: Read 6000
07/12/15 20:02:48 INFO io.TestMapFile: Read 7000
07/12/15 20:02:49 INFO io.TestMapFile: Read 8000
07/12/15 20:02:50 INFO io.TestMapFile: Read 9000
07/12/15 20:02:52 INFO io.TestMapFile: Read 10000
...
{code}

Now using TRUNK:
{code}
[EMAIL PROTECTED] hadoop-trunk]$ nohup ./bin/hadoop 
org.apache.hadoop.io.TestMapFile > testmapfile.timings < /dev/null &
[EMAIL PROTECTED] hadoop-trunk]$ tail -f testmapfile.timings
.07/12/15 19:44:27 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/15 19:44:27 INFO io.TestMapFile: Writing 100000 rows to 
testMapFileRandomAccess
07/12/15 19:44:48 INFO io.TestMapFile: Writing 100000 records took 20326ms
07/12/15 19:44:48 INFO io.TestMapFile: Reading 100000 random rows
07/12/15 19:49:17 INFO io.TestMapFile: Read 1000
07/12/15 19:53:40 INFO io.TestMapFile: Read 2000
07/12/15 19:57:47 INFO io.TestMapFile: Read 3000
...
{code}

TRUNK is taking about 13 minutes to read 3k.   Branch 0.15 does same random 
reads in ~5 seconds.  Thats > 2 orders of magnitude slower.

I retried the johano position cache patch to be sure and that doesn' t fix the 
problem.

Pauls' suggestion of our reading a byte a time would seem to jibe with the 
timings seen.

> MapFile.get on HDFS in TRUNK is WAY!!! slower than 0.15.x
> ---------------------------------------------------------
>
>                 Key: HADOOP-2434
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2434
>             Project: Hadoop
>          Issue Type: Bug
>          Components: io
>            Reporter: stack
>         Attachments: TestMapFile.java, TestMapFile.java
>
>
> Stall happens down in SequenceFile in the first call to getPos inside 
> readRecordLength.  I tried the johano patch from HADOOP-2172 that restores 
> the positional cache but that didn't seem to be the issue here.
> Here is data to support my assertion.
> I wrote a little program to make a MapFile of 1M records.  I then did 1M 
> random reads from same file.  Below are timings from a 0.15.0 and TRUNK as of 
> this afternoon run.
> 0.15.x branch:
> {code}
> [EMAIL PROTECTED] branch-0.15]$ ./bin/hadoop org.apache.hadoop.io.TestMapFile
> .07/12/15 01:29:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/15 01:29:02 INFO io.TestMapFile: Writing 1000000 rows to 
> testMapFileRandomAccess
> 07/12/15 01:32:04 INFO io.TestMapFile: Writing 1000000 records took 182009ms
> 07/12/15 01:32:04 INFO io.TestMapFile: Reading 1000000 random rows
> 07/12/15 01:48:02 INFO io.TestMapFile: Reading 1000000 random records took 
> 958243ms
> Time: 1,140.652
> OK (1 test)
> {code}
> For the below test using TRUNK r604352, I amended the test so it output a log 
> message every 100k reads:
> {code}
> [EMAIL PROTECTED] hadoop-trunk]$ ./bin/hadoop org.apache.hadoop.io.TestMapFile
> .07/12/15 01:56:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/15 01:56:34 INFO io.TestMapFile: Writing 1000000 rows to 
> testMapFileRandomAccess
> 07/12/15 01:59:38 INFO io.TestMapFile: Writing 1000000 records took 183986ms
> 07/12/15 01:59:38 INFO io.TestMapFile: Reading 1000000 random rows
> .........
> {code}
> After 20 minutes it still hadn't printed out the 'read 100k messages' (I had 
> to leave -- will fill in final figures later)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2434) MapFile.get on HDFS in TRUNK is WAY!!! slower than 0.15.x

Reply via email to