Does the same happen if you use single thread? Does disk space use go down if 
you compress table?

Should the 

word=word.trim().toLowerCase(); 

appear before setString?

When you store the frequency of words, you would have records like:
"home", 1217

Then the number of rows would not exceed couple thousand. Does this text really 
have 200k unique words? Do you have a unique index on word column? 

A suggestion: Single thread that re-uses the connection and the prepared 
statements might go faster than multiple threads.
Regards,
Ali

--- On Thu, 9/11/08, hcadavid <[EMAIL PROTECTED]> wrote:

> From: hcadavid <[EMAIL PROTECTED]>
> Subject: Derby problem: 13GB of space with 200000 records!
> To: [email protected]
> Date: Thursday, September 11, 2008, 5:39 AM
> Dear friends,
> 
> I'm using derby db to record word's frequencies
> from a large text corpus
> with a java program. It works nice with standard
> statements, like: "INSERT
> INTO WORDS VALUES('"+word+"',1)" (it
> takes 50Mb to store 400000 words), but
> when I switched to prepared statements and inner
> statements(in order to
> improve performance) and repeated the process, after few
> hours of processing
> (200MB of plain text), the database's disk consumption
> gets an absurd
> dimension: 13GB!, I mean, 13GB of disk space to store
> 400000 words (of
> standard length) and its frequencies!!. What may be the
> problem??
> the biggest file is: seg0\c3c0.dat (13GB), there are no
> log files problem.
> 
> Here is how I'm making insertions and updates:
> 
>               Connection
> con=EmbeddedDBMSConnectionBroker.getConnection();
>               PreparedStatement st=con.prepareStatement("INSERT
> INTO WORDS
> VALUES(?,1)");
>               st.setString(1, word);
>               
>               word=word.trim().toLowerCase();
>               
>               try{
>                       st.execute();   
>               }
>               catch(SQLIntegrityConstraintViolationException e){
>                       PreparedStatement ps=con.prepareStatement("update
> words set
> frequency=((select frequency from words where word=?)+1)
> where word=?");
>                       ps.setString(1, word);
>                       ps.setString(2, word);
>                       ps.execute();
>               }
>               
>               con.commit();
>               con.close();
> 
> This method is used concurrently by 100 threads. Please,
> anyone know the
> causes of this estrange Derby's behavior?? (handling
> GBs of disk space just
> for store few words isn't reasonable!).
> 
> Thanks in advance
> 
> Héctor
> -- 
> View this message in context:
> http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
> Sent from the Apache Derby Users mailing list archive at
> Nabble.com.



Reply via email to