Create a task (for the task queue) - we'll call it SomeTaskServlet
that imports a section of the file between two line numbers that are
passed into it.

In this task above, here's what you'd do:
1. count how many lines are in the file - let's say 105
2. divide that by ten (make sure to handle the remainder!!)
3. kick off/queue 10 SomeTaskServlet tasks:
   - lines 1-10
   - lines 11-20
   - lines 21-30
   ...
   - lines 100-105
4. Make sure that your queued task is idempotent (http://
en.wikipedia.org/wiki/Idempotence), and to throw an exception if
there's a problem.  That way, the queue processor will retry it on
error, and you'll never have to worry about a thing.

The one big gotcha is that you really should know how many records you
have to process up front, or you'll have a hard time knowing when to
stop chunking.  This is tough when you're dealing with databases in
App Engine, because (afaik), you can't "SELECT COUNT(*)", but you're
working with a file.

Simple!

If that file grows, and you wanna make sure you're scalable, then the
SomeTaskServlet handles a max number of lines - say 10.  If the range
that was passed into it is larger than 10, then queue off the work
that it was given into 10 batches back to another instance of itself.
By the time you have a small enough batch, you'll have a chunk of data
that you can process in 1/10 second.  I'd recommend giving this task
its own queue so you can throttle it so that you don't eat up your
dynamic concurrent thread count (or whatever they call that).

Reply whether this makes sense.  I just did this to import 5,000
records from another system via REST.  The first several rounds keep
forking off more and more threads to chunk the data down into smaller
bits.  At the end, each of the hundreds of threads has SUCH a small
job to do, you can throttle it, and they retry themselves on error.

- Blake

On Feb 18, 4:31 pm, novarse <[email protected]> wrote:
> Hello,
> I'm trying to get data from csv files into my datastore tables. My app
> is showing cpu loadings of
>  30356ms 20023cpu_ms 11480api_cpu_ms from the dash board and I was
> wondering if someone could see how I could improve this situation. I'm
> pretty new to Java.
>
> sample line from file:
> -470,16/12/2008 0:00:00,125
>
> this parses the file:
>         private void processEvents(String fileName) {
>                 try {
>                         previousLineNumber = 0;
>                         i = 1;
>                         file = new File(fileName);
>                         CSVParser shredder = new CSVParser(new 
> FileInputStream(file));
>                         while ((t = shredder.nextValue()) != null) {
>                                 if (previousLineNumber != 
> shredder.getLastLineNumber()) {
>                                         if (previousLineNumber != 0) { // 
> save event
>                                                 saveData(jdoEvent);
>                                         }
>                                         previousLineNumber = 
> shredder.getLastLineNumber();
>                                         i = 1;
>                                 } else
>                                         i++;
>                                 switch (i) {
>                                 case 1:
>                                         
> jdoEvent.setPKeyEventID(Long.parseLong(t));
>                                         break;
>                                 case 2:
>                                         try {
>                                                 Date d = processDate(t);
>                                                 jdoEvent.setDate(d);
>                                         } catch (ParseException e) {
>                                                 
> System.out.println(e.getMessage());
>                                         }
>                                         break;
>                                 case 3:
>                                         
> jdoEvent.setFKeyRaceDescription(Long.parseLong(t));
>                                         break;
>                                 }
>                         }
>
>                         if (previousLineNumber != 0) {
>                                 saveData(jdoEvent);
>                         }
>                 } catch (Exception e) {
>                         System.err.println(e.getMessage());
>                 }
>         }
>
> this saves the object:
>         private <J> void saveData(J jdoObject) {
>                 PersistenceManager pm = PMF.get().getPersistenceManager();
>                 try {
>                         pm.makePersistent(jdoObject);
>                 } finally {
>                         pm.close();
>                 }
>         }
>
> this is my data object:
>
> package com.myproj.client;
>
> import java.util.Date;
>
> import javax.jdo.annotations.IdGeneratorStrategy;
> import javax.jdo.annotations.IdentityType;
> import javax.jdo.annotations.PersistenceCapable;
> import javax.jdo.annotations.Persistent;
> import javax.jdo.annotations.PrimaryKey;
>
> import com.google.gwt.user.client.rpc.IsSerializable;
>
> @PersistenceCapable(identityType = IdentityType.APPLICATION)
> public class JdoEvent implements IsSerializable {
>
>         @PrimaryKey
>         @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
>         private Long pKeyEventID;
>
>         @Persistent
>         private Date date;
>
>         @Persistent
>         private Long fKeyRaceDescription;
>
>         public JdoEvent() {
>
>         }
>
>         public void setDate(Date date) {
>                 this.date = date;
>         }
>
>         public Date getDate() {
>                 return date;
>         }
>
>         public void setPKeyEventID(Long pKeyEventID) {
>                 this.pKeyEventID = pKeyEventID;
>         }
>
>         public Long getPKeyEventID() {
>                 return pKeyEventID;
>         }
>
>         public void setFKeyRaceDescription(Long fKeyRaceDescription) {
>                 this.fKeyRaceDescription = fKeyRaceDescription;
>         }
>
>         public Long getFKeyRaceDescription() {
>                 return fKeyRaceDescription;
>         }
>
>         public String getValues() {
>                 return getPKeyEventID() + ";  " + getFKeyRaceDescription() + 
> ";  "
>                                 + getDate();
>         }
>
> }
>
> Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to