Create a task (for the task queue) - we'll call it SomeTaskServlet that imports a section of the file between two line numbers that are passed into it.
In this task above, here's what you'd do: 1. count how many lines are in the file - let's say 105 2. divide that by ten (make sure to handle the remainder!!) 3. kick off/queue 10 SomeTaskServlet tasks: - lines 1-10 - lines 11-20 - lines 21-30 ... - lines 100-105 4. Make sure that your queued task is idempotent (http:// en.wikipedia.org/wiki/Idempotence), and to throw an exception if there's a problem. That way, the queue processor will retry it on error, and you'll never have to worry about a thing. The one big gotcha is that you really should know how many records you have to process up front, or you'll have a hard time knowing when to stop chunking. This is tough when you're dealing with databases in App Engine, because (afaik), you can't "SELECT COUNT(*)", but you're working with a file. Simple! If that file grows, and you wanna make sure you're scalable, then the SomeTaskServlet handles a max number of lines - say 10. If the range that was passed into it is larger than 10, then queue off the work that it was given into 10 batches back to another instance of itself. By the time you have a small enough batch, you'll have a chunk of data that you can process in 1/10 second. I'd recommend giving this task its own queue so you can throttle it so that you don't eat up your dynamic concurrent thread count (or whatever they call that). Reply whether this makes sense. I just did this to import 5,000 records from another system via REST. The first several rounds keep forking off more and more threads to chunk the data down into smaller bits. At the end, each of the hundreds of threads has SUCH a small job to do, you can throttle it, and they retry themselves on error. - Blake On Feb 18, 4:31 pm, novarse <[email protected]> wrote: > Hello, > I'm trying to get data from csv files into my datastore tables. My app > is showing cpu loadings of > 30356ms 20023cpu_ms 11480api_cpu_ms from the dash board and I was > wondering if someone could see how I could improve this situation. I'm > pretty new to Java. > > sample line from file: > -470,16/12/2008 0:00:00,125 > > this parses the file: > private void processEvents(String fileName) { > try { > previousLineNumber = 0; > i = 1; > file = new File(fileName); > CSVParser shredder = new CSVParser(new > FileInputStream(file)); > while ((t = shredder.nextValue()) != null) { > if (previousLineNumber != > shredder.getLastLineNumber()) { > if (previousLineNumber != 0) { // > save event > saveData(jdoEvent); > } > previousLineNumber = > shredder.getLastLineNumber(); > i = 1; > } else > i++; > switch (i) { > case 1: > > jdoEvent.setPKeyEventID(Long.parseLong(t)); > break; > case 2: > try { > Date d = processDate(t); > jdoEvent.setDate(d); > } catch (ParseException e) { > > System.out.println(e.getMessage()); > } > break; > case 3: > > jdoEvent.setFKeyRaceDescription(Long.parseLong(t)); > break; > } > } > > if (previousLineNumber != 0) { > saveData(jdoEvent); > } > } catch (Exception e) { > System.err.println(e.getMessage()); > } > } > > this saves the object: > private <J> void saveData(J jdoObject) { > PersistenceManager pm = PMF.get().getPersistenceManager(); > try { > pm.makePersistent(jdoObject); > } finally { > pm.close(); > } > } > > this is my data object: > > package com.myproj.client; > > import java.util.Date; > > import javax.jdo.annotations.IdGeneratorStrategy; > import javax.jdo.annotations.IdentityType; > import javax.jdo.annotations.PersistenceCapable; > import javax.jdo.annotations.Persistent; > import javax.jdo.annotations.PrimaryKey; > > import com.google.gwt.user.client.rpc.IsSerializable; > > @PersistenceCapable(identityType = IdentityType.APPLICATION) > public class JdoEvent implements IsSerializable { > > @PrimaryKey > @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) > private Long pKeyEventID; > > @Persistent > private Date date; > > @Persistent > private Long fKeyRaceDescription; > > public JdoEvent() { > > } > > public void setDate(Date date) { > this.date = date; > } > > public Date getDate() { > return date; > } > > public void setPKeyEventID(Long pKeyEventID) { > this.pKeyEventID = pKeyEventID; > } > > public Long getPKeyEventID() { > return pKeyEventID; > } > > public void setFKeyRaceDescription(Long fKeyRaceDescription) { > this.fKeyRaceDescription = fKeyRaceDescription; > } > > public Long getFKeyRaceDescription() { > return fKeyRaceDescription; > } > > public String getValues() { > return getPKeyEventID() + "; " + getFKeyRaceDescription() + > "; " > + getDate(); > } > > } > > Thank you -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
