Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-09 Thread Valentin Davydov
On Mon, May 07, 2012 at 01:34:13PM -0400, peter korinis wrote:

> So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
> will it work? 

If you use gawk, you can easily change your comma separated file to the 
series of correct SQL INSERT statements (besides ensuring validity of 
input data such as right number of fields in each row etc.) and not rely
on the sqlite shell csv-parsing capabilities.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 8 May 2012, at 12:34am, "peter korinis"  wrote:

> I did NOT specify a file name when I started sqlite3 - so I guess all is
> lost. I'll have to redo it.
> 
> So then the first thing is to specify the db (file) name - what syntax? I
> tried typing a file name (test.db) but error.

It helps when you read the documentation.

The standard way to give a filename is to do it when you start the shell tool.  
Examples

~/Desktop> sqlite3 ./mydatabase.sql

C:\ > sqlite3 C:\PROJECT\FILES\DB.SQL

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
I did NOT specify a file name when I started sqlite3 - so I guess all is
lost. I'll have to redo it.

So then the first thing is to specify the db (file) name - what syntax? I
tried typing a file name (test.db) but error.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Gerry Snyder
The file name does not necessarily have anything to do with the table name.
On May 7, 2012 2:25 PM, "peter korinis"  wrote:

> Simon
>
> I searched the entire disk for the table name and no matches.
>
>
>
> pk
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 7 May 2012, at 10:25pm, peter korinis  wrote:

> I searched the entire disk for the table name and no matches.

SQLite keeps its tables in databases.  One database is one file on disk.  The 
files are not in any magic place: each time you create a database you have to 
specify a specific file (folder and filename) for SQLite to keep its data in.  
If you want to access data you saved earlier, you have to specify the same file 
(folder and filename) or SQLite will open a new file in this new place you've 
specified, find that there's no data there, and telling you you have no tables.

So when you put the data into the tables in the first place, make sure you've 
specified a path and filename.  If you find you're creating tables and haven't 
given SQLite a path and filename yet, you're doing the wrong thing !  And when 
you want to get your data back out again, specify the same folder and filename.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Simon

I searched the entire disk for the table name and no matches.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread David Bicking
Did you remember to specify the file name when you started sqlite3. If you 
don't give the file name, everything is done to the memory database and is 
discarded when you exit.

David




 From: peter korinis 
To: sqlite-users@sqlite.org 
Sent: Monday, May 7, 2012 5:01 PM
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
 
I can't find a table I just created and imported data.

With .databases I can see 'main'  but with no location on disk and with .
tables I see the table.

Instructions says they are save . but where. I can't find them with windows
search?



pk



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 7 May 2012, at 10:01pm, peter korinis  wrote:

> I can't find a table I just created and imported data.
> 
> With .databases I can see 'main'  but with no location on disk and with .
> tables I see the table.
> 
> Instructions says they are save . but where. I can't find them with windows
> search?

You probably imported into one file and now you're opening another.  Try 
looking for the file on disk, making sure it is a reasonable size, then 
specifying the full path (folders and filename) when you open the file.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
I can't find a table I just created and imported data.

With .databases I can see 'main'  but with no location on disk and with .
tables I see the table.

Instructions says they are save . but where. I can't find them with windows
search?

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Thanks for advice

 

http://www.pantz.org/software/sqlite/sqlite_commands_and_general_usage.html

 

had the explanation/example I needed to get the import done successfully. 

Using ".separator ," was what I was missing.

 

peter

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Black, Michael (IS)
Next iteration of my csvimport utility.  I think it handles Simon's test cases 
adequately (or it gives an error).

Now allows for multiline fields
Correctly(?) handles  fields with escaped quotes.
Checks the entire file to ensure all records have the correct field count 
before processing.
Determines # of columns from the 1st record in the file.
Can now change delimiters for fields, records, and field enclosures.

Aborts on malformed records.

Usage: csvimport [-d] [-e] [-r] filename databasename tablename
-d   Change field delimiter, default -d,
-e   Change field enclosure char, default -e\"
-r   Change record delimiter, default -r

On my system using 1M records generated with this:
main() {
  int i;
  for(i=0; i<100; ++i) {
printf("%d,%d,%d,%d\n",i,i+1,i+2,i+3);
  }
}

Takes 4.1 seconds to import the 1M records.


// csvimport -- At least somewhat RFC4180 compliant
// quoted fields can span multiple lines
// quotes can be inserted by escaping with another quote
#include 
#include 
#include 
#include "sqlite3.h"

#define BUFSIZE 100
#define MOD 10
#define MAXTOKEN 10

int nline=0;
int ncol;
char buf[BUFSIZE];
int peek;
char delim=',';
char quote='"';
char eol='\n';

inline int nextChar(FILE *fp) {
  static char *ptr=buf;
  if (*ptr == 0) {
fgets(buf,BUFSIZE,fp);
if (feof(fp)) return EOF;
++nline;
ptr=buf;
  }
  peek = *(ptr+1);
  return *ptr++;
}

int countColumns(FILE *fp) {
  // 1st line(s) of the file determine how many columns we have
  int p;
  int ncol = 0;
  while((p=nextChar(fp))!=EOF) {
if (p == quote) { // start quote
  p=nextChar(fp);
  if (p==quote) continue; // escaped quote
  while((p=nextChar(fp)) && p!=quote); // skip to next quote
}
if (p == delim) {
  ++ncol;
}
if (p == eol) {
  break;
}
  }
  if (feof(fp)) {
return 0;
  }
  return ++ncol;
}

inline char *nextToken(FILE *fp) {
  // 1st line(s) of the file determine how many columns we have
  int p;
  int n=0;
  static char *token;
  int inquote=0;
  if (token == NULL) {
token=malloc(MAXTOKEN);
if (token==NULL) {
  fprintf(stderr,"malloc error: %%m\n");
  exit(1);
}
  }
  token[0]=0;
  while((p=nextChar(fp))!=EOF) {
if (p == quote) { // start quote
  p=nextChar(fp);
  if (p==quote) { // escaped quote
token[n++]=p;
token[n]=0;
continue;
  }
  token[n++]=p;
  token[n]=0;
  inquote = 1;
  while(inquote) {
p=nextChar(fp);
if (p==quote && peek==quote) {
  p=nextChar(fp); // skip on escaped quote
  token[n++]=p;
  token[n]=0;
  continue;
}
if (p==quote) {
  inquote=0;
} else {
  token[n++]=p;
  token[n]=0;
}
  }
  continue;
}
if (p == delim) {
  token[n]=0;
  return token;
}
if (p == eol) {
  break;
}
token[n++]=p;
token[n]=0;
  }
  if (feof(fp)) {
return NULL;
  }
  token[n]=0;
  return token;
}

void checkrc(sqlite3 *db,int rc,int checkrc,int flag,char *msg,char *str) {
  if (rc != checkrc) {
fprintf(stderr,"\nLine#%d: ",nline);
fprintf(stderr,msg,str);
fprintf(stderr,"%s\n",sqlite3_errmsg(db));
if (flag) { // then fatal
  exit(1);
}
  }
}

char escape(char *s) {
  if (*s != '\\') return *s;
  ++s;
  switch(*s) {
  case 't':
return '\t';
  case 'n':
return '\n';
  case '"':
return '"';
  }
  fprintf(stderr,"Unknown escape sequence=%s\n",--s);
  return '\0';
}

// Add comma delimited file to exisiting database/table
// Quoted strings are accepted
int main(int argc, char *argv[]) {
  sqlite3 *db;
  sqlite3_stmt *stmt;
  int rc;
  int ninsert=0;
  char sql[32768];
  FILE *fp;
  char *filename;
  char *databasename;
  char *tablename;

  while(argv[1] && argv[1][0]=='-') {
switch(argv[1][1]) {
case 'd':
  delim=escape([1][2]);
  break;
case 'e':
  eol=escape([1][2]);
  break;
default:
  fprintf(stderr,"Bad option: %s\n",argv[1]);
}
++argv;
--argc;
  }
  if (argc != 4) {
fprintf(stderr,"Usage: csvimport [-d] [-e] [-r] filename databasename 
tablename\n");
fprintf(stderr,"-d   Change field delimiter, default -d,\n");
fprintf(stderr,"-e   Change field enclosure char, default -e\\\"\n");
fprintf(stderr,"-r   Change record delimiter, default -r\\n\n");
exit(1);
  }
  filename = argv[1];
  databasename = argv[2];
  tablename = argv[3];
  rc = sqlite3_open_v2(databasename,,SQLITE_OPEN_READWRITE,NULL);
  checkrc(db,SQLITE_OK,rc,1,"Error opening database '%s': ",databasename);
  fp=fopen(filename,"r");
  if (fp == NULL) {
perror(filename);
exit(1);
  }
  // count the columns
  ncol = countColumns(fp);

  fprintf(stderr,"%d columns detected...checking file contents\n",ncol);
  rewind(fp);
  int mycol;
  nline=0;
  while((mycol=countColumns(fp)) == ncol);
  if (!feof(fp)) {
fprintf(stderr,"Error at line#%d, 

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread David Bicking
>From the docs:

.separator STRING  Change separator used by output mode and .import
replace string with a comma, without quotes, then do your import. It should 
work.

David



 From: peter korinis 
To: sqlite-users@sqlite.org 
Sent: Monday, May 7, 2012 1:34 PM
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
 
On 7 May 2012, at 4:41pm, peter korinis  wrote:



> My input file is a comma-delimited text file

> When I run .import I get the following "Error: FILE line 1: expected 46

> columns of data but found 1"

> It seems .import is not recognizing comma delimiter.







"The default separator is a pipe symbol ("|")."



Simon.



So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
will it work? 

Do I have to use ".mode csv" as Jonas just advised?



Thanks,

peter

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 7 May 2012, at 6:34pm, peter korinis  wrote:

> 
> 
> "The default separator is a pipe symbol ("|")."
> 
> Simon.
> 
> 
> 
> So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
> will it work? 

I think so.  Or you could issue the command '.separator ,' before your import 
command.

> Do I have to use ".mode csv" as Jonas just advised?

That is for output, not input.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
On 7 May 2012, at 4:41pm, peter korinis  wrote:

 

> My input file is a comma-delimited text file

> When I run .import I get the following "Error: FILE line 1: expected 46

> columns of data but found 1"

> It seems .import is not recognizing comma delimiter.

 



 

"The default separator is a pipe symbol ("|")."

 

Simon.

 

So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
will it work? 

Do I have to use ".mode csv" as Jonas just advised?

 

Thanks,

peter

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 7 May 2012, at 6:27pm, peter korinis  wrote:

> Without knowing syntax, I'm forced to ask these dumb questions or give up
> (not) . since no good documentation - sorry.

Is very good documentation.  The program itself says

SQLite version 3.7.12 2012-03-31 02:46:20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> 

when you start it.  And '.help' explains how to use separators.  If you don't 
think that's enough, you can google 'sqlite shell' and find a web page with 
documentation on it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
No I had not used ".mode" - the wording described ".mode" as "set output
mode where." . the word "output" made me think this was for .output
statement.

I just tried " .mode csv ".  what do your * mean? Do I put in the file
and/or table name or nothing more?

I tried several different ways but still get same error when I try to
import.

 

Without knowing syntax, I'm forced to ask these dumb questions or give up
(not) . since no good documentation - sorry.

 

Thanks,

 

Have you ran *.mode csv*?

Jonas Malaco Filho

 

2012/5/7 peter korinis 

> Regarding SQLITE3.exe statement ".import FILE TABLE"

> I created a table.

> My input file is a comma-delimited text file

> When I run .import I get the following "Error: FILE line 1: expected 46

> columns of data but found 1"

> It seems .import is not recognizing comma delimiter.

> I suspect this is a simple syntax error, but I don't find any

> document/explanation.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Simon Slavin

On 7 May 2012, at 4:41pm, peter korinis  wrote:

> My input file is a comma-delimited text file
> When I run .import I get the following "Error: FILE line 1: expected 46
> columns of data but found 1"
> It seems .import is not recognizing comma delimiter.



"The default separator is a pipe symbol ("|")."

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread Jonas Malaco Filho
Have you ran *.mode csv*?

Jonas Malaco Filho



2012/5/7 peter korinis 

> Regarding SQLITE3.exe statement ".import FILE TABLE"
> I created a table.
> My input file is a comma-delimited text file
> When I run .import I get the following "Error: FILE line 1: expected 46
> columns of data but found 1"
> It seems .import is not recognizing comma delimiter.
> I suspect this is a simple syntax error, but I don't find any
> document/explanation.
>
> Thanks,
> peter
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
> Sent: Friday, May 04, 2012 11:16 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>
> On 4 May 2012, at 4:02pm, peter korinis  wrote:
>
> > Sqlitespy looks good ... I will try it.
> > website says download contains sqlite itself, which I already have -
> > will there be a problem using ...spy with existing sqlite?
>
> SQLite is not a single library which has to live somewhere on your
> computer.
> It is C code which each programmer includes in their program.  You can have
> twenty programs on your disk, each using a different version of SQLite,
> without problems.
>
> > I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;"
> > to load a 999x46 comma-delimited file into a previously created empty
> > table with 46 col. (if this works I will load two 22M row x 46 col csv
> > files into that
> > table.) does this cmd work this way or must I create INSERT statements
> > to do
> > 999 inserts (later 44M inserts)?
>
> Semicolons are needed at the end of SQL commands.  You don't want them at
> the end of commands which start with a dot.  Apart from that you have
> something worth trying.  Why not make a tiny test case with three lines of
> two columns before you get started with the 46-column monster ?
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Regarding SQLITE3.exe statement ".import FILE TABLE"
I created a table.
My input file is a comma-delimited text file
When I run .import I get the following "Error: FILE line 1: expected 46
columns of data but found 1"
It seems .import is not recognizing comma delimiter.
I suspect this is a simple syntax error, but I don't find any
document/explanation.

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
Sent: Friday, May 04, 2012 11:16 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file


On 4 May 2012, at 4:02pm, peter korinis  wrote:

> Sqlitespy looks good ... I will try it. 
> website says download contains sqlite itself, which I already have - 
> will there be a problem using ...spy with existing sqlite?

SQLite is not a single library which has to live somewhere on your computer.
It is C code which each programmer includes in their program.  You can have
twenty programs on your disk, each using a different version of SQLite,
without problems.

> I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" 
> to load a 999x46 comma-delimited file into a previously created empty 
> table with 46 col. (if this works I will load two 22M row x 46 col csv 
> files into that
> table.) does this cmd work this way or must I create INSERT statements 
> to do
> 999 inserts (later 44M inserts)?

Semicolons are needed at the end of SQL commands.  You don't want them at
the end of commands which start with a dot.  Apart from that you have
something worth trying.  Why not make a tiny test case with three lines of
two columns before you get started with the 46-column monster ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Udi Karni
Hi Michael,

May I make a request on behalf of the C++ challenged crowd?

Is it possible to fuse your contribution with the existing capabilities of
the ".import" command so that it can be activated via syntax similar to -
"RECORDS DELIMITED BY... FIELDS TERMINATED BY... OPTIONALLY ENCLOSED BY..."

It would make ".import" so much more powerful.

Thanks !

On Sun, May 6, 2012 at 5:59 AM, Black, Michael (IS)
wrote:

> I modified my csvimport to allow for quoted fields.  Still automaticallhy
> detects the # of columns.
> If quoted fields also contain internal quotes they must be escaped by
> doubling them.
> e.g.
> col1,"this is, a ""quoted"" field",col3
> It's not fully RFC4180 compliant as it doesn't allow for CF/LF inside
> quotes -- does anybody really need/use that?
>
> #include 
> #include 
> #include 
> #include "sqlite3.h"
> #define BUFSIZE 100
> #define MOD 10
> #define MAXTOK 10
> char *nextToken(char *line) {
>  static char token[MAXTOK];
>  static char *p;
>  int n=0;
>  if (line) {
>p = line;
>  }
>  while(*p && *p != ',' && *p!='\r' && *p!= '\n') {
>if (*p == '"') { // quoted field
>  ++p; // move past first quote
>  while((*p && *p!='"') || *(p+1)=='"') {
>if (*p == '"' && *(p+1)=='"') { // escaped quote?
>  token[n++]='"';
>  p+=2;
>  continue;
>}
>token[n++] = *p;
>++p;
>  }
>  token[n++]=0;
>  ++p;
>  if (*p == ',') ++p;
>  return token;
>}
>token[n++]=*p;
>++p;
>  }
>  if (n > MAXTOK) {
>fprintf(stderr,"Token too big??\n");
>exit(1);
>  }
>  token[n]=0;
>  ++p;
>  if (*p == ',') ++p;
>  if (n > 0) {
>return token;
>  }
>  return NULL;
> }
> // Add comma delimited file to exisiting database/table
> // Quoted strings are accepted
> int main(int argc, char *argv[]) {
>  sqlite3 *db;
>  sqlite3_stmt *stmt;
>  int rc;
>  int ncol=0;
>  int nline=0;
>  char *buf=malloc(BUFSIZE);
>  char sql[8192];
>  FILE *fp;
>  char *filename;
>  char *databasename;
>  char *tablename;
>  if (argc != 4) {
>fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
>exit(1);
>  }
>  filename = argv[1];
>  databasename = argv[2];
>  tablename = argv[3];
>  rc = sqlite3_open_v2(databasename,,SQLITE_OPEN_READWRITE,NULL);
>  if (rc) {
>fprintf(stderr,"Error opening database '%s':
> %s\n",databasename,sqlite3_errmsg(db));
>exit(1);
>  }
>  sprintf(sql,"insert into %s values (",tablename);
>  fp=fopen(filename,"r");
>  if (fp == NULL) {
>perror(filename);
>exit(1);
>  }
>  buf[BUFSIZE-1] = '*';
>  fgets(buf,BUFSIZE,fp);
>  if (buf[BUFSIZE-1] != '*') {
>fprintf(stderr,"BUFSIZE not big enough...aborting\n");
>exit(1);
>  }
>  // count the columns
>  char *p=nextToken(buf);
>  ncol=0;
>  while(p) {
>++ncol;
>strcat(sql,ncol==1?"":",");
>strcat(sql,"?");
>p=nextToken(NULL);
>  }
>  printf("%d columns detected\n",ncol);
>  strcat(sql,")");
>  rewind(fp);
>  // Let's wrap things in a transaction
>  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
>  if (rc) {
>fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  // prepare our statement
>  rc = sqlite3_prepare(db,sql,strlen(sql),,NULL);
>  if (rc) {
>fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  // Loop over file file
>  while(fgets(buf,BUFSIZE,fp)) {
>char *p=nextToken(buf);
>int i=1;
>++nline;
>if ((nline % MOD)==0) {
>  printf("%d\r",nline);
>  fflush(stdout);
>}
>while(p) { // bind the columns as text, table will take care of
> conversion to column types
>  rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
>  if (rc) {
>fprintf(stderr,"bind_text failed on '%s':
> %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  ++i;
>  p=nextToken(NULL);
>}
>if (--i != ncol) {
>  fprintf(stderr,"expected %d cols, got %d cols on
> line#%d\n",ncol,i,nline);
>} else {
>  rc = sqlite3_step(stmt);
>  if (rc != SQLITE_DONE) {
>fprintf(stderr,"Insert failed on '%s':
> %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  rc = sqlite3_reset(stmt);
>  if (rc) {
>fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>}
>  }
>  rc=sqlite3_finalize(stmt);
>  if (rc) {
>fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
>exit(1);
>  }
>  printf("%d inserts, committing...\n",nline);
>  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
>  if (rc) {
>fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  rc=sqlite3_close(db);
>  if (rc) {
>fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
>exit(1);
>  }
>  fclose(fp);
>  return 0;
> }
>
>
>
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating 

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Jean-Christophe Deschamps

Hi Simon,

My test cases weren't intended to supply data, they're crash tests.  I 
have faith that Mike's software correctly interprets syntactically 
correct cases.  I was interested in making sure it didn't crash, hang 
or spew on syntactic errors.


You're right.

JcD 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Simon Slavin

On 6 May 2012, at 3:20pm, Jean-Christophe Deschamps  wrote:

> Also fields starting with a " delimiter should comply with the rules.

My test cases weren't intended to supply data, they're crash tests.  I have 
faith that Mike's software correctly interprets syntactically correct cases.  I 
was interested in making sure it didn't crash, hang or spew on syntactic errors.

('spew' is to loop around generating endless output.  Not a crash nor a hang, a 
third common failure mode.  And one that fills up disk until the OS terminates 
the app.)

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Jean-Christophe Deschamps



Not sure I buy your test cases.


I don't either, but for diverging reasons.

Fields containg double quotes must be in double quotes.  So only one 
of your examples is valid


first,""second"",third

I'll have to fix that one.


No, it's invalid.  Apply to " the same rules that apply to ' for 
literals in SQL statements: if user data consists of 'ABC' (including 
the quotes), then the fragment SQL is

WHERE column = '''ABC''' (i.e. you still need the outer quotes).

So the above should be:
first,"""second""",third

BUT, fields that don't start with a " delimiter shouldn't be considered 
delimited.


first,second"",third

has 3 data fields for me:
first
second""
third

Also fields starting with a " delimiter should comply with the rules.

first,""second,third
^-- missing comma separator

I could add the CR/LF...but I've never seen that used in a CSV 
format.  Then again, may as well use the standard.  That's the nice 
thing about standards...there are so many to choose from :-)


I've had to process many CSVs with either 0x00, CR, LF or CRLF 
embedded. Some TSV (Tab separated values), some with ; as separators, 
even some with multiline comments...


JcD 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Simon Slavin

On 6 May 2012, at 3:03pm, "Black, Michael (IS)"  wrote:

> Not sure I buy your test cases.
> 
> Fields containg double quotes must be in double quotes.  So only one of your 
> examples is valid
> 
> first,""second"",third
> 
> I'll have to fix that one.

Sure.  The rest aren't interpretation tests.  They're crash tests: your app 
shouldn't crash, hang, or spew on them.  It doesn't have to turn them into 
something they're not.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Black, Michael (IS)
Not sure I buy your test cases.

Fields containg double quotes must be in double quotes.  So only one of your 
examples is valid

first,""second"",third

I'll have to fix that one.



I could add the CR/LF...but I've never seen that used in a CSV format.  Then 
again, may as well use the standard.  That's the nice thing about 
standards...there are so many to choose from :-)





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Simon Slavin [slav...@bigfraud.org]
Sent: Sunday, May 06, 2012 8:35 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file


On 6 May 2012, at 1:59pm, "Black, Michael (IS)"  wrote:

> I modified my csvimport to allow for quoted fields.  Still automaticallhy 
> detects the # of columns.
> If quoted fields also contain internal quotes they must be escaped by 
> doubling them.
> e.g.
> col1,"this is, a ""quoted"" field",col3

I trust you as a competent programmer, but I have test cases for you:

first",second,"third
first,""second,third
first"",second,third
first,""second"",third
first,""second,""third
first"",second,""third
first"",second"",third

producing error messages (or how many columns are filled) is up to you.

> It's not fully RFC4180 compliant as it doesn't allow for CF/LF inside quotes 
> -- does anybody really need/use that?

Yes, in that I have multi-line address fields and multi-paragraph notes fields. 
 But we don't store that data in CSV files, so it wouldn't matter that a CSV 
import tool couldn't handle it.  Actually that system uses MySQL not SQLite.  
Might be interesting to see if the SQLite shell tool failed to handle it.

We had a lot of discussion about allowing CR (which is the character we use) 
but decided that multi-paragraph notes fields was an idea whose time had come.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Simon Slavin

On 6 May 2012, at 1:59pm, "Black, Michael (IS)"  wrote:

> I modified my csvimport to allow for quoted fields.  Still automaticallhy 
> detects the # of columns.
> If quoted fields also contain internal quotes they must be escaped by 
> doubling them.
> e.g.
> col1,"this is, a ""quoted"" field",col3

I trust you as a competent programmer, but I have test cases for you:

first",second,"third
first,""second,third
first"",second,third
first,""second"",third
first,""second,""third
first"",second,""third
first"",second"",third

producing error messages (or how many columns are filled) is up to you.

> It's not fully RFC4180 compliant as it doesn't allow for CF/LF inside quotes 
> -- does anybody really need/use that?

Yes, in that I have multi-line address fields and multi-paragraph notes fields. 
 But we don't store that data in CSV files, so it wouldn't matter that a CSV 
import tool couldn't handle it.  Actually that system uses MySQL not SQLite.  
Might be interesting to see if the SQLite shell tool failed to handle it.

We had a lot of discussion about allowing CR (which is the character we use) 
but decided that multi-paragraph notes fields was an idea whose time had come.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Black, Michael (IS)
I modified my csvimport to allow for quoted fields.  Still automaticallhy 
detects the # of columns.
If quoted fields also contain internal quotes they must be escaped by doubling 
them.
e.g.
col1,"this is, a ""quoted"" field",col3
It's not fully RFC4180 compliant as it doesn't allow for CF/LF inside quotes -- 
does anybody really need/use that?

#include 
#include 
#include 
#include "sqlite3.h"
#define BUFSIZE 100
#define MOD 10
#define MAXTOK 10
char *nextToken(char *line) {
  static char token[MAXTOK];
  static char *p;
  int n=0;
  if (line) {
p = line;
  }
  while(*p && *p != ',' && *p!='\r' && *p!= '\n') {
if (*p == '"') { // quoted field
  ++p; // move past first quote
  while((*p && *p!='"') || *(p+1)=='"') {
if (*p == '"' && *(p+1)=='"') { // escaped quote?
  token[n++]='"';
  p+=2;
  continue;
}
token[n++] = *p;
++p;
  }
  token[n++]=0;
  ++p;
  if (*p == ',') ++p;
  return token;
}
token[n++]=*p;
++p;
  }
  if (n > MAXTOK) {
fprintf(stderr,"Token too big??\n");
exit(1);
  }
  token[n]=0;
  ++p;
  if (*p == ',') ++p;
  if (n > 0) {
return token;
  }
  return NULL;
}
// Add comma delimited file to exisiting database/table
// Quoted strings are accepted
int main(int argc, char *argv[]) {
  sqlite3 *db;
  sqlite3_stmt *stmt;
  int rc;
  int ncol=0;
  int nline=0;
  char *buf=malloc(BUFSIZE);
  char sql[8192];
  FILE *fp;
  char *filename;
  char *databasename;
  char *tablename;
  if (argc != 4) {
fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
exit(1);
  }
  filename = argv[1];
  databasename = argv[2];
  tablename = argv[3];
  rc = sqlite3_open_v2(databasename,,SQLITE_OPEN_READWRITE,NULL);
  if (rc) {
fprintf(stderr,"Error opening database '%s': 
%s\n",databasename,sqlite3_errmsg(db));
exit(1);
  }
  sprintf(sql,"insert into %s values (",tablename);
  fp=fopen(filename,"r");
  if (fp == NULL) {
perror(filename);
exit(1);
  }
  buf[BUFSIZE-1] = '*';
  fgets(buf,BUFSIZE,fp);
  if (buf[BUFSIZE-1] != '*') {
fprintf(stderr,"BUFSIZE not big enough...aborting\n");
exit(1);
  }
  // count the columns
  char *p=nextToken(buf);
  ncol=0;
  while(p) {
++ncol;
strcat(sql,ncol==1?"":",");
strcat(sql,"?");
p=nextToken(NULL);
  }
  printf("%d columns detected\n",ncol);
  strcat(sql,")");
  rewind(fp);
  // Let's wrap things in a transaction
  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // prepare our statement
  rc = sqlite3_prepare(db,sql,strlen(sql),,NULL);
  if (rc) {
fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // Loop over file file
  while(fgets(buf,BUFSIZE,fp)) {
char *p=nextToken(buf);
int i=1;
++nline;
if ((nline % MOD)==0) {
  printf("%d\r",nline);
  fflush(stdout);
}
while(p) { // bind the columns as text, table will take care of conversion 
to column types
  rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
  if (rc) {
fprintf(stderr,"bind_text failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  ++i;
  p=nextToken(NULL);
}
if (--i != ncol) {
  fprintf(stderr,"expected %d cols, got %d cols on line#%d\n",ncol,i,nline);
} else {
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
fprintf(stderr,"Insert failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc = sqlite3_reset(stmt);
  if (rc) {
fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
}
  }
  rc=sqlite3_finalize(stmt);
  if (rc) {
fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
exit(1);
  }
  printf("%d inserts, committing...\n",nline);
  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc=sqlite3_close(db);
  if (rc) {
fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
exit(1);
  }
  fclose(fp);
  return 0;
}




Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems



From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Oliver Schneider [oli...@f-prot.com]
Sent: Friday, May 04, 2012 12:40 PM
To: sqlite-users@sqlite.org
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file


On 2012-05-01 20:41, Baruch Burstein wrote:
> It is already wrapped in a transaction.
> I seem to remember seeing somewhere that the .import command doesn't
> understand escaping, e.g.
>
> "one","two,three"
>
> will get imported as
>
> "one" | "two | three"
>
> (the quotes are part of the data, and the second column was split into 

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-05 Thread Valentin Davydov
On Fri, May 04, 2012 at 11:23:42AM -0400, peter korinis wrote:
>
> After I get the data loaded and inspect for nulls in prospective index
> attributes, can I add indices?

Yes, of course. Moreover, it would be much faster to add indices at once 
at the end rather than create them beforehand and then update with every new
piece of data.

> I was planning to load using sqlite3 CLI ".import" command. Is there a way I
> can monitor the progress of the load, with only minimal impact on
> performance ?

You can monitor (by some external means) either the database file size or 
read pointer position in the input data file. Both of them grow linearly 
with amount of data processed.

> I've started several loads only to find out hours later that
> nothing has been loaded.

Anyway be prepared to spend some of your time on learning.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Oliver Schneider
On 2012-05-02 14:06, peter korinis wrote:
> Thank you all. 
> Look like I'm stuck with the CLI though I have contacted Nucleon software
> support ... tried CLI yesterday but need more practice.
> Is there a good reference book you would recommend for SQLite?
Absolutely. The one by Mike Owens was a great read. The first two or
three chapters explain SQL as a whole. Even though I _had_ to work with
MySQL and stuff before, it was this book that actually "enlightened" me
w.r.t. to some of the cool features SQL (RDBMSs) give you.

Name: "The Definitive Guide to SQLite"

There is another one from O'Reilly whose author is also on this list. I
think I would have found this one harder to grasp without reading the
Owens book before, though.

// Oliver

PS: re-sending after the mailing lists chokes on my message with S/MIME
signature.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Oliver Schneider
On 2012-05-01 20:41, Baruch Burstein wrote:
> It is already wrapped in a transaction.
> I seem to remember seeing somewhere that the .import command doesn't
> understand escaping, e.g.
> 
> "one","two,three"
> 
> will get imported as
> 
> "one" | "two | three"
> 
> (the quotes are part of the data, and the second column was split into two
> by the comma)
> Just a point to be aware of.
That could be mitigated by writing a small script that "converts" the
CSV contents line-wise to SQL-statements, right?


// Oliver

PS: re-sending after the mailing lists chokes on my message with S/MIME
signature.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Black, Michael (IS)
Here's a utility to import a comma separated file (does not work for quoted 
strings or strings with commas).



Figures out the # of columns automagically from the csv file.



All wrapped in a singled transaction.

Shows progress every 100,000 inserts.



Usage: csvimport filename databasename tablename



#include 
#include 
#include 
#include "sqlite3.h"

#define BUFSIZE 100
#define MOD 10

char *sep=",\r\n";
// Add comma delimited file to exisiting database/table
// Strings cannot have commas in them and quotes will be retained
int main(int argc, char *argv[]) {
  sqlite3 *db;
  sqlite3_stmt *stmt;
  int rc;
  int ncol=0;
  int nline=0;
  char *buf=malloc(BUFSIZE);
  char sql[8192];
  FILE *fp;
  char *filename;
  char *databasename;
  char *tablename;
  if (argc != 4) {
fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
exit(1);
  }
  filename = argv[1];
  databasename = argv[2];
  tablename = argv[3];
  rc = sqlite3_open_v2(databasename,,SQLITE_OPEN_READWRITE,NULL);
  if (rc) {
fprintf(stderr,"Error opening database '%s': 
%s\n",databasename,sqlite3_errmsg(db));
exit(1);
  }
  sprintf(sql,"insert into %s values (",tablename);
  fp=fopen(filename,"r");
  if (fp == NULL) {
perror(filename);
exit(1);
  }
  buf[BUFSIZE-1] = '*';
  fgets(buf,BUFSIZE,fp);
  if (buf[BUFSIZE-1] != '*') {
fprintf(stderr,"BUFSIZE not big enough...aborting\n");
exit(1);
  }
  // count the columns
  char *p=strtok(buf,sep);
  ncol=0;
  while(p) {
++ncol;
strcat(sql,ncol==1?"":",");
strcat(sql,"?");
p=strtok(NULL,sep);
  }
  printf("%d columns detected\n",ncol);
  strcat(sql,")");
  puts(sql);
  rewind(fp);
  // Let's wrap things in a transaction
  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // prepare our statement
  rc = sqlite3_prepare(db,sql,strlen(sql),,NULL);
  if (rc) {
fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // Loop over file file
  while(fgets(buf,BUFSIZE,fp)) {
char *p=strtok(buf,sep);
int i=1;
++nline;
if ((nline % MOD)==0) {
  printf("%d\r",nline);
  fflush(stdout);
}
while(p) { // bind the columns as text, table will take care of conversion 
to column types
  rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
  if (rc) {
fprintf(stderr,"bind_text failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  ++i;
  p=strtok(NULL,sep);
}
if (--i != ncol) {
  fprintf(stderr,"expected %d cols, got %d cols on line#%d\n",ncol,i,nline)

} else {
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
fprintf(stderr,"Insert failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc = sqlite3_reset(stmt);
  if (rc) {
fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
}
  }
  rc=sqlite3_finalize(stmt);
  if (rc) {
fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
exit(1);
  }
  printf("%d inserts, committing...\n",nline);
  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc=sqlite3_close(db);
  if (rc) {
fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
exit(1);
  }
  fclose(fp);
  return 0;
}



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of peter korinis [kori...@earthlink.net]
Sent: Friday, May 04, 2012 10:23 AM
To: 'General Discussion of SQLite Database'
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file

I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions.
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index).
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin 

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Simon Slavin

On 4 May 2012, at 4:23pm, peter korinis  wrote:

> I have scaled down the attributes of interest to 46 columns (discarding the
> other 550). No columns are calculated. No updates to this file ... one user
> ... only  query, sort, etc. type transactions. 
> So I want to load two 22GB csv files into an empty 46 column table. (I
> intend to test load with 999 records by 46 col file.) initially I only have
> 1 index on a record # ... am not positive several other fields that I want
> to index may not be missing data in some records (I assume that will error
> out if I make those an index). 
> After I get the data loaded and inspect for nulls in prospective index
> attributes, can I add indices?

Yes.  You have done 'CREATE TABLE'.  The import process will execute many 
'INSERT' commands.  And you can 'CREATE INDEX' commands after your data is 
already present in the table.  It will work the way you want.

> I was planning to load using sqlite3 CLI ".import" command. Is there a way I
> can monitor the progress of the load, with only minimal impact on
> performance ?

Using File Explorer or the Finder or whatever, just keep an information window 
on the database file open.  You should see the file size increasing constantly 
while the command is working.  Alternatively you might use Task Manager or 
Activity Monitor to monitor something else that's going on.  For example, how 
many octets that process has written to disk.

> I've started several loads only to find out hours later that
> nothing has been loaded.

Yes, that's very annoying.  You keep thinking "It must be finished soon.".

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions. 
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index). 
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin Davydov
Sent: Friday, May 04, 2012 9:43 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user 
> with no dev support for a pilot project (single user, no updates, just
queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows 
> x 600 columns (fields all <15 char). Seems like a DBMS will allow me 
> to query it in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of
data, especially selecting small subsets based on a simple criteria. However
before trying to do some real work you have to understand the structure of
your data, realize your possible queries and carefully design database
schema (tables and, equally important, indises). Perhaps, putting all data
in a single 600-column table is not a good idea (though allowed
technically), especially if your columns are equal by their physical nature:
it is not so easy to select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel 
> dual-proc with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I 
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager 
> add-on but it would not load the csv files - 'csv worker failed'. So I 
> tried Database Master from Nucleon but it failed after loading (it 
> took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". 
> I tried to create another table in the same db but could not with same 
> error message. The DB size shows as 10,000KB (that looks suspiciously 
> like a size setting?).

Try bare sqlite shell instead of those external tools. It should take at
least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3
terabytes of disk space, contains more than 10^10 records and still provides
small selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate
functions, but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite
or
> Win7 that will permit the load . or is there a better tool for this
project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Simon Slavin

On 4 May 2012, at 4:02pm, peter korinis  wrote:

> Sqlitespy looks good ... I will try it. 
> website says download contains sqlite itself, which I already have - will
> there be a problem using ...spy with existing sqlite?

SQLite is not a single library which has to live somewhere on your computer.  
It is C code which each programmer includes in their program.  You can have 
twenty programs on your disk, each using a different version of SQLite, without 
problems.

> I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
> a 999x46 comma-delimited file into a previously created empty table with 46
> col. (if this works I will load two 22M row x 46 col csv files into that
> table.) does this cmd work this way or must I create INSERT statements to do
> 999 inserts (later 44M inserts)?

Semicolons are needed at the end of SQL commands.  You don't want them at the 
end of commands which start with a dot.  Apart from that you have something 
worth trying.  Why not make a tiny test case with three lines of two columns 
before you get started with the 46-column monster ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
Sqlitespy looks good ... I will try it. 
website says download contains sqlite itself, which I already have - will
there be a problem using ...spy with existing sqlite?

I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
a 999x46 comma-delimited file into a previously created empty table with 46
col. (if this works I will load two 22M row x 46 col csv files into that
table.) does this cmd work this way or must I create INSERT statements to do
999 inserts (later 44M inserts)?

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Oliver Peters
Sent: Thursday, May 03, 2012 7:23 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Am 03.05.2012 19:59, schrieb peter korinis:
> I have R but really haven't used it much. I know it's a great stats 
> package and great for data reduction ... but I want to perform queries 
> against my 44GB of data, filtering records by a variety of attributes, 
> comparing those subsets in a variety of ad hoc ways, perhaps 
> summing/counting other fields, etc.

I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause it's
fast and reliable but I prefer to write(!) SQL code and not to create it
through a generator (as it is done in Access).

sqlitespy can't do the import job; I always do this with the CLI by creating
INSERT statements with my scripting language in a separate file

since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3); INSERT INTO
table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file
won't become so large


> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is
R
> a good query tool?

afaik there is no other way than to write (!) SQL Code - depending on 
the problem this can be done in an R script or directly in the database 
(i.e. as a VIEW) or as a combination

[...]

Oliver
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Valentin Davydov
On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data. 

Yes, SQLite is quite capable of doing simple analyzis of such amounts of data,
especially selecting small subsets based on a simple criteria. However before 
trying to do some real work you have to understand the structure of your data,
realize your possible queries and carefully design database schema (tables 
and, equally important, indises). Perhaps, putting all data in a single 
600-column table is not a good idea (though allowed technically), especially
if your columns are equal by their physical nature: it is not so easy to
select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times 
more disk space than the raw data. Probably 200GB would not be enough, 
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).

Try bare sqlite shell instead of those external tools. It should take 
at least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3 terabytes 
of disk space, contains more than 10^10 records and still provides small 
selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find 
there. SQLite isn't very good for calculation of complex aggregate functions,
but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and 
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single 
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread Oliver Peters

Am 03.05.2012 19:59, schrieb peter korinis:

I have R but really haven't used it much. I know it's a great stats package
and great for data reduction ... but I want to perform queries against my
44GB of data, filtering records by a variety of attributes, comparing those
subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
etc.


I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause 
it's fast and reliable but I prefer to write(!) SQL code and not to 
create it through a generator (as it is done in Access).


sqlitespy can't do the import job; I always do this with the CLI by 
creating INSERT statements with my scripting language in a separate file


since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3);
INSERT INTO table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file 
won't become so large




This is the kind of job excel is good at ... but the data is too bit!
Seems like a database plus a good query GUI or some BI app would work. is R
a good query tool?


afaik there is no other way than to write (!) SQL Code - depending on 
the problem this can be done in an R script or directly in the database 
(i.e. as a VIEW) or as a combination


[...]

Oliver
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread Warren Young

On 5/3/2012 11:59 AM, peter korinis wrote:

is R a good query tool?


It's a programming language.  It can do anything within your power to 
persuade the interpreter.


One of the fundamental data types in R is the data frame, which is 
roughly equivalent to a SQLite table.


This is an R equivalent to "SELECT * FROM MYTABLE WHERE V2 < 9":

results <- subset(my.table, V2 < 9)

But, this is not the place for an R tutorial.  Take the question up on 
an R mailing list if you want to learn more of its capabilities.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread Freddy López
As said Oliver, I don't think the real issue is have to choose one
software: or SQLite or R.

I've used SQLite with data around 10GB under Windows 7 with 4GB of RAM and
it worked perfectly. Yes, it size is less than yours but I learned that the
use of every GUI was a problem.

Since that, I always work simply with sqlite3.exe and other tools such as
SCITE to write queries.

Cheers.


On Thu, May 3, 2012 at 1:29 PM, peter korinis  wrote:

> I have R but really haven't used it much. I know it's a great stats package
> and great for data reduction ... but I want to perform queries against my
> 44GB of data, filtering records by a variety of attributes, comparing those
> subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
> etc.
> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is R
> a good query tool?
>
> Thanks,
> peter
>
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Warren Young
> Sent: Thursday, May 03, 2012 9:36 AM
> To: General Discussion of SQLite Database
>  Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>  On 5/1/2012 2:06 PM, peter korinis wrote:
> > Is SQLite the wrong tool for this project?
>
> Probably.
>
> SQLite is a data storage tool.  With enough SQL cleverness, you can turn it
> into a data *reduction* tool.  But a data analysis tool?  No, not without
> marrying it to a real programming language.
>
> Granted, that's what almost everyone does do with SQLite, but if you're
> going to learn a programming language, I'd recommend you learn R, a
> language
> and environment made for the sort of problem you find yourself stuck with.
> http://r-project.org/
>
> There are several R GUIs out there.  I like R Studio best:
> http://www.rstudio.org/
>
> You'll still find R Studio a sharp shock compared to Excel.  And yes, it
> will require some programming, and yes, I know you said you aren't a
> programmer.  But in the rest of the thread, it looks like people have
> convinced you to use SQLite from the command line, typing in raw SQL
> commands; guess what, that's programming.  Not on the level of R code, but
> R
> isn't far down the same slippery slope.
>
> It may help you to know that R is most popular in the statistics community,
> which of course is populated by statisticians, not programmers.
>
> R isn't the easiest programming language to pick up, but it's far from the
> hardest.  It's very similar to JavaScript, though a bit tougher to learn,
> mostly due to having accumulated some strange syntax over its 36 years.
> (That's counting R's predecessor, S.)
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
«But Gwindor answered: 'The doom lies in yourself, not in your name.'»

JRR Tolkien
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread peter korinis
I have R but really haven't used it much. I know it's a great stats package
and great for data reduction ... but I want to perform queries against my
44GB of data, filtering records by a variety of attributes, comparing those
subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
etc. 
This is the kind of job excel is good at ... but the data is too bit!
Seems like a database plus a good query GUI or some BI app would work. is R
a good query tool?

Thanks,
peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Warren Young
Sent: Thursday, May 03, 2012 9:36 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On 5/1/2012 2:06 PM, peter korinis wrote:
> Is SQLite the wrong tool for this project?

Probably.

SQLite is a data storage tool.  With enough SQL cleverness, you can turn it
into a data *reduction* tool.  But a data analysis tool?  No, not without
marrying it to a real programming language.

Granted, that's what almost everyone does do with SQLite, but if you're
going to learn a programming language, I'd recommend you learn R, a language
and environment made for the sort of problem you find yourself stuck with.
http://r-project.org/

There are several R GUIs out there.  I like R Studio best: 
http://www.rstudio.org/

You'll still find R Studio a sharp shock compared to Excel.  And yes, it
will require some programming, and yes, I know you said you aren't a
programmer.  But in the rest of the thread, it looks like people have
convinced you to use SQLite from the command line, typing in raw SQL
commands; guess what, that's programming.  Not on the level of R code, but R
isn't far down the same slippery slope.

It may help you to know that R is most popular in the statistics community,
which of course is populated by statisticians, not programmers.

R isn't the easiest programming language to pick up, but it's far from the
hardest.  It's very similar to JavaScript, though a bit tougher to learn,
mostly due to having accumulated some strange syntax over its 36 years.
(That's counting R's predecessor, S.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread Oliver Peters

Am Do 03 Mai 2012 15:35:46 CEST schrieb Warren Young:

On 5/1/2012 2:06 PM, peter korinis wrote:

Is SQLite the wrong tool for this project?


Probably.

SQLite is a data storage tool. With enough SQL cleverness, you can
turn it into a data *reduction* tool. But a data analysis tool? No,
not without marrying it to a real programming language.

Granted, that's what almost everyone does do with SQLite, but if
you're going to learn a programming language, I'd recommend you learn
R, a language and environment made for the sort of problem you find
yourself stuck with. http://r-project.org/

There are several R GUIs out there. I like R Studio best:
http://www.rstudio.org/

You'll still find R Studio a sharp shock compared to Excel. And yes,
it will require some programming, and yes, I know you said you aren't
a programmer. But in the rest of the thread, it looks like people have
convinced you to use SQLite from the command line, typing in raw SQL
commands; guess what, that's programming. Not on the level of R code,
but R isn't far down the same slippery slope.

It may help you to know that R is most popular in the statistics
community, which of course is populated by statisticians, not
programmers.

R isn't the easiest programming language to pick up, but it's far from
the hardest. It's very similar to JavaScript, though a bit tougher to
learn, mostly due to having accumulated some strange syntax over its
36 years. (That's counting R's predecessor, S.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



I don't think that the question is "R or sqlite?". As you already wrote 
sqlite is only a container for more or less structured data (I prefer 
more structure) and that is the job it really does well. If he only 
needs to aggregate data the functionality sqlite offers is enough. If 
more complex analysis is needed (graphical or not) the combination of R 
with sqlite (i.e. via RODBC) would do a superb job.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread Warren Young

On 5/1/2012 2:06 PM, peter korinis wrote:

Is SQLite the wrong tool for this project?


Probably.

SQLite is a data storage tool.  With enough SQL cleverness, you can turn 
it into a data *reduction* tool.  But a data analysis tool?  No, not 
without marrying it to a real programming language.


Granted, that's what almost everyone does do with SQLite, but if you're 
going to learn a programming language, I'd recommend you learn R, a 
language and environment made for the sort of problem you find yourself 
stuck with.  http://r-project.org/


There are several R GUIs out there.  I like R Studio best: 
http://www.rstudio.org/


You'll still find R Studio a sharp shock compared to Excel.  And yes, it 
will require some programming, and yes, I know you said you aren't a 
programmer.  But in the rest of the thread, it looks like people have 
convinced you to use SQLite from the command line, typing in raw SQL 
commands; guess what, that's programming.  Not on the level of R code, 
but R isn't far down the same slippery slope.


It may help you to know that R is most popular in the statistics 
community, which of course is populated by statisticians, not programmers.


R isn't the easiest programming language to pick up, but it's far from 
the hardest.  It's very similar to JavaScript, though a bit tougher to 
learn, mostly due to having accumulated some strange syntax over its 36 
years.  (That's counting R's predecessor, S.)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread peter korinis
not sure yet ... but i'm working on it (between interruptions).
thanks


-Original Message-
>From: "Black, Michael (IS)" 
>Sent: May 2, 2012 10:15 AM
>To: General Discussion of SQLite Database 
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Does that mean using the CLI worked for you?
>
>
>
>If so, you may be able to access the database with the other programs AFTER 
>you create it.
>
>Seems that creating a db from csv is challenging to some and not thoroughly 
>tested for large data sources.
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>
>From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
>behalf of peter korinis [kori...@earthlink.net]
>Sent: Wednesday, May 02, 2012 9:06 AM
>To: 'General Discussion of SQLite Database'
>Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Thank you all.
>Look like I'm stuck with the CLI though I have contacted Nucleon software
>support ... tried CLI yesterday but need more practice.
>Is there a good reference book you would recommend for SQLite?
>
>peter
>
>-Original Message-
>From: sqlite-users-boun...@sqlite.org
>[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS)
>Sent: Tuesday, May 01, 2012 4:22 PM
>To: General Discussion of SQLite Database
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>You need to try and do an import from the shell.  GUIs seem to have way too
>many limits.
>
>http://sqlite.org/download.html
>
>
>
>Don't do any indexes up frontdo them afterwords if they'll help your
>queries.  Indexes will slow down your import notably.
>
>
>
>I don't think you're anywhere near the limits of sqlite since it talks about
>terabytes.
>
>http://sqlite.org/limits.html
>
>
>
>Somebody else can answer for sure but wrapping your .import inside a
>transaction may be a good thing.
>
>I don't know if that's done by default.
>
>
>
>Your queries are liable to be pretty slow depending on what you have to do.
>
>
>
>
>
>
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>
>From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
>behalf of peter korinis [kori...@earthlink.net]
>Sent: Tuesday, May 01, 2012 3:06 PM
>To: sqlite-users@sqlite.org
>Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file
>
>I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
>dev support for a pilot project (single user, no updates, just queries).
>
>
>
>I want to analyze the data contained in a 44GB csv file with 44M rows x 600
>columns (fields all <15 char). Seems like a DBMS will allow me to query it
>in a variety of ways to analyze the data.
>
>
>
>I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
>with 4GB RAM + 200GB free disk space.
>
>End-user tools like Excel & Access failed due to lack of memory. I
>downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
>but it would not load the csv files - 'csv worker failed'. So I tried
>Database Master from Nucleon but it failed after loading (it took 100
>minutes) ~57,000 rows with error message = 'database or disk is full". I
>tried to create another table in the same db but could not with same error
>message. The DB size shows as 10,000KB (that looks suspiciously like a size
>setting?).
>
>
>
>From what I've read SQLite can handle this size DB. So it seems that either
>I do not have enough RAM or there are memory/storage (default) limits or
>maybe time-out issues that prevent loading this large file . or the 2 GUI
>tools I tried have size limits. I do have a fast server (16GB, 12 procs,
>64-bit intel, Win server) and an iMAC available.
>
>
>
>1.   Is SQLite the wrong tool for this project? (I don't want the
>overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
>2.   If SQLite will work, are there configuration settings in SQLite or
>Win7 that will permit the load . or is there a better tool for this project?
>
>
>
>Thanks much for helping a newbie!
>
>
>
>peterK
>
>
>
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>___
>sqlite-users mailing list

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread Black, Michael (IS)
Does that mean using the CLI worked for you?



If so, you may be able to access the database with the other programs AFTER you 
create it.

Seems that creating a db from csv is challenging to some and not thoroughly 
tested for large data sources.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of peter korinis [kori...@earthlink.net]
Sent: Wednesday, May 02, 2012 9:06 AM
To: 'General Discussion of SQLite Database'
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Thank you all.
Look like I'm stuck with the CLI though I have contacted Nucleon software
support ... tried CLI yesterday but need more practice.
Is there a good reference book you would recommend for SQLite?

peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS)
Sent: Tuesday, May 01, 2012 4:22 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

You need to try and do an import from the shell.  GUIs seem to have way too
many limits.

http://sqlite.org/download.html



Don't do any indexes up frontdo them afterwords if they'll help your
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
behalf of peter korinis [kori...@earthlink.net]
Sent: Tuesday, May 01, 2012 3:06 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



>From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread peter korinis
Thank you all. 
Look like I'm stuck with the CLI though I have contacted Nucleon software
support ... tried CLI yesterday but need more practice.
Is there a good reference book you would recommend for SQLite?

peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS)
Sent: Tuesday, May 01, 2012 4:22 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

You need to try and do an import from the shell.  GUIs seem to have way too
many limits.

http://sqlite.org/download.html



Don't do any indexes up frontdo them afterwords if they'll help your
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
behalf of peter korinis [kori...@earthlink.net]
Sent: Tuesday, May 01, 2012 3:06 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



>From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread Oliver Peters

Am 01.05.2012 22:06, schrieb peter korinis:

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all<15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel&  Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).




From what I've read SQLite can handle this size DB. So it seems that either

I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



Hello - reading your post I'd say you're on the same path I started a 
few years ago :-)


You need the following knowledge to work with databases (including Access):

- databasedesign (catchword normalization)

- SQL (I mean writing your own code to really understand what you're doing)

- a scripting language to preprocess your data (i.e. your data might be 
in a fixed length format,...). I recommend gawk because it's powerful 
and simple.


- it might be necessary to inspect your csv file if you don't know the 
structure (is there a header?, what separator is used?,...) so under 
windows I'd try to print the first 10 lines into a new file and use an 
editor to inspect it or use a pager program like "less" that is capable 
to open your 44 GB file in a second (don't use notepad for this task ;-))


In a few words: a database is not just a more powerful spreadsheet program

greetings
Oliver
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-01 Thread Baruch Burstein
It is already wrapped in a transaction.
I seem to remember seeing somewhere that the .import command doesn't
understand escaping, e.g.

"one","two,three"

will get imported as

"one" | "two | three"

(the quotes are part of the data, and the second column was split into two
by the comma)
Just a point to be aware of.

On Tue, May 1, 2012 at 11:22 PM, Black, Michael (IS)  wrote:

> You need to try and do an import from the shell.  GUIs seem to have way
> too many limits.
>
> http://sqlite.org/download.html
>
>
>
> Don't do any indexes up frontdo them afterwords if they'll help your
> queries.  Indexes will slow down your import notably.
>
>
>
> I don't think you're anywhere near the limits of sqlite since it talks
> about terabytes.
>
> http://sqlite.org/limits.html
>
>
>
> Somebody else can answer for sure but wrapping your .import inside a
> transaction may be a good thing.
>
> I don't know if that's done by default.
>
>
>
> Your queries are liable to be pretty slow depending on what you have to do.
>
>
>
>
>
>
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> Advanced Analytics Directorate
>
> Advanced GEOINT Solutions Operating Unit
>
> Northrop Grumman Information Systems
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org]
> on behalf of peter korinis [kori...@earthlink.net]
> Sent: Tuesday, May 01, 2012 3:06 PM
> To: sqlite-users@sqlite.org
> Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file
>
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.
>
>
>
> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc
> with 4GB RAM + 200GB free disk space.
>
> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).
>
>
>
> From what I've read SQLite can handle this size DB. So it seems that either
> I do not have enough RAM or there are memory/storage (default) limits or
> maybe time-out issues that prevent loading this large file . or the 2 GUI
> tools I tried have size limits. I do have a fast server (16GB, 12 procs,
> 64-bit intel, Win server) and an iMAC available.
>
>
>
> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
> 2.   If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this
> project?
>
>
>
> Thanks much for helping a newbie!
>
>
>
> peterK
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Programming today is a race between software engineers striving to build
bigger and better idiot-proof programs, and the Universe trying to produce
bigger and better idiots. So far, the Universe is winning.  - Rich Cook
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-01 Thread Black, Michael (IS)
You need to try and do an import from the shell.  GUIs seem to have way too 
many limits.

http://sqlite.org/download.html



Don't do any indexes up frontdo them afterwords if they'll help your 
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about 
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a 
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of peter korinis [kori...@earthlink.net]
Sent: Tuesday, May 01, 2012 3:06 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



>From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-01 Thread Baruch Burstein
If none of your fields contain a comma, you can just use the sqlite3
terminal to load a csv file.

On Tue, May 1, 2012 at 11:06 PM, peter korinis wrote:

> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
>
>
>
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data.
>
>
>
> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel
> dual-proc
> with 4GB RAM + 200GB free disk space.
>
> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).
>
>
>
> From what I've read SQLite can handle this size DB. So it seems that either
> I do not have enough RAM or there are memory/storage (default) limits or
> maybe time-out issues that prevent loading this large file . or the 2 GUI
> tools I tried have size limits. I do have a fast server (16GB, 12 procs,
> 64-bit intel, Win server) and an iMAC available.
>
>
>
> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
> 2.   If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this
> project?
>
>
>
> Thanks much for helping a newbie!
>
>
>
> peterK
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Programming today is a race between software engineers striving to build
bigger and better idiot-proof programs, and the Universe trying to produce
bigger and better idiots. So far, the Universe is winning.  - Rich Cook
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users