It works, and the performance is breathtaking :
8.6 million entities (4.3 lines x 2 entities per line) created in 1.5h,
using 100 shardsŠ
Compared to my previous non-blob-based mapper job, CPU cost remains a little
high (190 CPU hours), but I can't complain.
Thank you guys.

From:  "Ikai Lan (Google)" <[email protected]>
Reply-To:  <[email protected]>
Date:  Wed, 17 Nov 2010 16:06:07 -0800
To:  <[email protected]>
Subject:  Re: [appengine-java] Mapper & Blobstore bytes read limit

The bug has been fixed. Check out the latest code from the
appengine-mapreduce project.

Note that the ratio between blobstore bytes read and blob size is not 1:1.
In my tests they were closer to 10:1. This is expected behavior for the time
being. We're working on more options so users can better tune the behavior.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine
Blogger: http://googleappengine.blogspot.com
<http://googleappengine.blogspot.com/>
Reddit: http://www.reddit.com/r/appengine
Twitter: http://twitter.com/app_engine



On Wed, Nov 17, 2010 at 2:19 AM, Cyrille Vincey <[email protected]> wrote:
> VERY good news.
> Can't wait. Thanks.
> 
> From:  "Ikai Lan (Google)" <[email protected]>
> Reply-To:  <[email protected]>
> Date:  Tue, 16 Nov 2010 12:07:59 -0800
> 
> To:  <[email protected]>
> Subject:  Re: [appengine-java] Mapper & Blobstore bytes read limit
> 
> We discovered a bug. We're not reading in the entire blob, but we are reading
> in far too much data.
> 
> Fred has a fix waiting in the rafters. I'll post again when it's been pushed.
> 
> --
> Ikai Lan 
> Developer Programs Engineer, Google App Engine
> Blogger: http://googleappengine.blogspot.com
> <http://googleappengine.blogspot.com/>
> Reddit: http://www.reddit.com/r/appengine
> Twitter: http://twitter.com/app_engine
> 
> 
> 
> On Thu, Nov 4, 2010 at 2:36 AM, Cyrille Vincey <[email protected]> wrote:
>> Not a lot of interesting stuff to say :
>> 1. My code is quite as simple as your sample code: the only real difference
>> is that I create 2 parent/child entities in a row for one given csv line
>> entry.
>> 2. My csv file contains 4.3 million lines.
>> 2. I launched the mapper job with 10 shards.
>> 3. "worker-attempt-XXX" tasks had 20 retries each in average.
>> 4. The blobstore bytes read quota (100 Go) got reached within the first 3
>> hours.
>> 5. Est. 10% of the entities where actually created after 24h (with my
>> previous non-blob-based mapper job, those 4.3 million entities where created
>> within 1 day)
>> 6. Log does not reveal anything interesting.
>> 
>> I am currently running a new test with a 500,000 lines csv file (20 Mb file).
>> Performance looks better. To me, blob file size may have an influence on the
>> mapper performance.
>> 
>> If you need more details, let me know.
>> 
>> From:  "Ikai Lan (Google)" <[email protected]>
>> Reply-To:  <[email protected]>
>> Date:  Wed, 3 Nov 2010 12:22:10 -0700
>> To:  <[email protected]>
>> Subject:  Re: [appengine-java] Mapper & Blobstore bytes read limit
>> 
>> This behavior doesn't seem right. No, the entire blob should not be getting
>> read. We'll look into this.
>> 
>> Do you have any more details? Could tasks be getting retried?
>> 
>> --
>> Ikai Lan 
>> Developer Programs Engineer, Google App Engine
>> Blogger: http://googleappengine.blogspot.com
>> <http://googleappengine.blogspot.com/>
>> Reddit: http://www.reddit.com/r/appengine
>> Twitter: http://twitter.com/app_engine
>> 
>> 
>> 
>> On Tue, Nov 2, 2010 at 9:42 AM, Cyrille Vincey <[email protected]> wrote:
>>> I've been testing Ikai's bulkload mapper (see url below) with a pretty big
>>> csv file (200 Mb).
>>> It works great, and I encourage most of you to consider implementing this
>>> for entity uploads.
>>> 
>>> Yet, I do face one last issue with an unexpected quota : blobstore bytes
>>> read.
>>> This quota cannot be tuned via the billing settings, and it's not clear
>>> whether it limits the speed of my process or not when it's reached.
>>> 
>>> 
>>> See ? Yep, it's a lot of bytes readŠ
>>> Could someone confirm that the blob csv file is *NOT* fully fetched each
>>> time the mapper iterates on a new line ?
>>> 
>>> (ikai's post) 
>>> http://ikaisays.com/2010/08/11/using-the-app-engine-mapper-for-bulk-data-imp
>>> ort/
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine for Java" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> <mailto:google-appengine-java%[email protected]> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine-java?hl=en.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine for Java" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine-java?hl=en.
>> -- 
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine for Java" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]
>> <mailto:google-appengine-java%[email protected]> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine-java?hl=en.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Google App Engine for Java" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine-java?hl=en.
> -- 
> You received this message because you are subscribed to the Google Groups
> "Google App Engine for Java" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]
> <mailto:google-appengine-java%[email protected]> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine-java?hl=en.


-- 
You received this message because you are subscribed to the Google Groups
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

<<inline: Capture d¹écran 2010-11-02 à 17.17.25.png>>

Reply via email to