Does the same happen if you use single thread? Does disk space use go down if you compress table?
Should the word=word.trim().toLowerCase(); appear before setString? When you store the frequency of words, you would have records like: "home", 1217 Then the number of rows would not exceed couple thousand. Does this text really have 200k unique words? Do you have a unique index on word column? A suggestion: Single thread that re-uses the connection and the prepared statements might go faster than multiple threads. Regards, Ali --- On Thu, 9/11/08, hcadavid <[EMAIL PROTECTED]> wrote: > From: hcadavid <[EMAIL PROTECTED]> > Subject: Derby problem: 13GB of space with 200000 records! > To: [email protected] > Date: Thursday, September 11, 2008, 5:39 AM > Dear friends, > > I'm using derby db to record word's frequencies > from a large text corpus > with a java program. It works nice with standard > statements, like: "INSERT > INTO WORDS VALUES('"+word+"',1)" (it > takes 50Mb to store 400000 words), but > when I switched to prepared statements and inner > statements(in order to > improve performance) and repeated the process, after few > hours of processing > (200MB of plain text), the database's disk consumption > gets an absurd > dimension: 13GB!, I mean, 13GB of disk space to store > 400000 words (of > standard length) and its frequencies!!. What may be the > problem?? > the biggest file is: seg0\c3c0.dat (13GB), there are no > log files problem. > > Here is how I'm making insertions and updates: > > Connection > con=EmbeddedDBMSConnectionBroker.getConnection(); > PreparedStatement st=con.prepareStatement("INSERT > INTO WORDS > VALUES(?,1)"); > st.setString(1, word); > > word=word.trim().toLowerCase(); > > try{ > st.execute(); > } > catch(SQLIntegrityConstraintViolationException e){ > PreparedStatement ps=con.prepareStatement("update > words set > frequency=((select frequency from words where word=?)+1) > where word=?"); > ps.setString(1, word); > ps.setString(2, word); > ps.execute(); > } > > con.commit(); > con.close(); > > This method is used concurrently by 100 threads. Please, > anyone know the > causes of this estrange Derby's behavior?? (handling > GBs of disk space just > for store few words isn't reasonable!). > > Thanks in advance > > Héctor > -- > View this message in context: > http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html > Sent from the Apache Derby Users mailing list archive at > Nabble.com.
