oh, i didn't realise the number of lines to process in each batch had to be that small. originally the file was 1.37mb with 47,400 lines with each line similar to: 1,298,"B",3,30/12/1899 1:31:37,6 i then split the file into 500 files of 3kb each containing about 96 lines, which it still balked on when processing it locally it took no time at all it seemed and i assumed it would be the same on the online server. i'll try and follow your advice for this thanks
On Feb 20, 3:27 pm, Blake <[email protected]> wrote: > Create a task (for the task queue) - we'll call it SomeTaskServlet > that imports a section of the file between two line numbers that are > passed into it. > > In this task above, here's what you'd do: > 1. count how many lines are in the file - let's say 105 > 2. divide that by ten (make sure to handle the remainder!!) > 3. kick off/queue 10 SomeTaskServlet tasks: > - lines 1-10 > - lines 11-20 > - lines 21-30 > ... > - lines 100-105 > 4. Make sure that your queued task is idempotent (http:// > en.wikipedia.org/wiki/Idempotence), and to throw an exception if > there's a problem. That way, the queue processor will retry it on > error, and you'll never have to worry about a thing. > > The one big gotcha is that you really should know how many records you > have to process up front, or you'll have a hard time knowing when to > stop chunking. This is tough when you're dealing with databases in > App Engine, because (afaik), you can't "SELECT COUNT(*)", but you're > working with a file. > > Simple! > > If that file grows, and you wanna make sure you're scalable, then the > SomeTaskServlet handles a max number of lines - say 10. If the range > that was passed into it is larger than 10, then queue off the work > that it was given into 10 batches back to another instance of itself. > By the time you have a small enough batch, you'll have a chunk of data > that you can process in 1/10 second. I'd recommend giving this task > its own queue so you can throttle it so that you don't eat up your > dynamic concurrent thread count (or whatever they call that). > > Reply whether this makes sense. I just did this to import 5,000 > records from another system via REST. The first several rounds keep > forking off more and more threads to chunk the data down into smaller > bits. At the end, each of the hundreds of threads has SUCH a small > job to do, you can throttle it, and they retry themselves on error. > > - Blake > > On Feb 18, 4:31 pm, novarse <[email protected]> wrote: > > > Hello, > > I'm trying to get data from csv files into my datastore tables. My app > > is showing cpu loadings of > > 30356ms 20023cpu_ms 11480api_cpu_ms from the dash board and I was > > wondering if someone could see how I could improve this situation. I'm > > pretty new to Java. > > > sample line from file: > > -470,16/12/2008 0:00:00,125 > > > this parses the file: > > private void processEvents(String fileName) { > > try { > > previousLineNumber = 0; > > i = 1; > > file = new File(fileName); > > CSVParser shredder = new CSVParser(new > > FileInputStream(file)); > > while ((t = shredder.nextValue()) != null) { > > if (previousLineNumber != > > shredder.getLastLineNumber()) { > > if (previousLineNumber != 0) { // > > save event > > saveData(jdoEvent); > > } > > previousLineNumber = > > shredder.getLastLineNumber(); > > i = 1; > > } else > > i++; > > switch (i) { > > case 1: > > > > jdoEvent.setPKeyEventID(Long.parseLong(t)); > > break; > > case 2: > > try { > > Date d = processDate(t); > > jdoEvent.setDate(d); > > } catch (ParseException e) { > > > > System.out.println(e.getMessage()); > > } > > break; > > case 3: > > > > jdoEvent.setFKeyRaceDescription(Long.parseLong(t)); > > break; > > } > > } > > > if (previousLineNumber != 0) { > > saveData(jdoEvent); > > } > > } catch (Exception e) { > > System.err.println(e.getMessage()); > > } > > } > > > this saves the object: > > private <J> void saveData(J jdoObject) { > > PersistenceManager pm = PMF.get().getPersistenceManager(); > > try { > > pm.makePersistent(jdoObject); > > } finally { > > pm.close(); > > } > > } > > > this is my data object: > > > package com.myproj.client; > > > import java.util.Date; > > > import javax.jdo.annotations.IdGeneratorStrategy; > > import javax.jdo.annotations.IdentityType; > > import javax.jdo.annotations.PersistenceCapable; > > import javax.jdo.annotations.Persistent; > > import javax.jdo.annotations.PrimaryKey; > > > import com.google.gwt.user.client.rpc.IsSerializable; > > > @PersistenceCapable(identityType = IdentityType.APPLICATION) > > public class JdoEvent implements IsSerializable { > > > @PrimaryKey > > @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) > > private Long pKeyEventID; > > > @Persistent > > private Date date; > > > @Persistent > > private Long fKeyRaceDescription; > > > public JdoEvent() { > > > } > > > public void setDate(Date date) { > > this.date = date; > > } > > > public Date getDate() { > > return date; > > } > > > public void setPKeyEventID(Long pKeyEventID) { > > this.pKeyEventID = pKeyEventID; > > } > > > public Long getPKeyEventID() { > > return pKeyEventID; > > } > > > public void setFKeyRaceDescription(Long fKeyRaceDescription) { > > this.fKeyRaceDescription = fKeyRaceDescription; > > } > > > public Long getFKeyRaceDescription() { > > return fKeyRaceDescription; > > } > > > public String getValues() { > > return getPKeyEventID() + "; " + getFKeyRaceDescription() > > + "; " > > + getDate(); > > } > > > } > > > Thank you -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
