Re: [sqlite] LEFT JOIN: ON vs. WHERE

2019-10-27 Thread Gabor Grothendieck
The difference between these two is what happens when a row
of A has no matches in B.

select * from A left join B on A.Time = B.Time
select * from A left join B where A.Time = B.Time

In the first one the condition is carried out during the join
so if a row of A has no matches in B then the B part of the result
is filled out with NULL values.

In the second one A is joined with B and then the "where" is applied
afterwards so if there is no match to a row of A then that row of
A  will not appear at all in the result.

Here is an example in R,  Note that there was no match to the
third row of A in B so in the first case that row of A appears in the
result with
NULLs in the B column positions.  In the second case the third row of A
is dropped from the result.

> library(sqldf)
>
> # BOD comes with R
> A <- BOD[1:3, ]
> B <- BOD[1:2, ]
> A
  Time demand
118.3
22   10.3
33   19.0
> B
  Time demand
118.3
22   10.3
>
> sqldf("select * from A left join B on A.Time = B.Time")
  Time demand Time..3 demand..4
118.3   1   8.3
22   10.3   2  10.3
33   19.0  NANA
> sqldf("select * from A left join B where A.Time = B.Time")
  Time demand Time..3 demand..4
118.3   1   8.3
22   10.3   2  10.3


On Sun, Oct 27, 2019 at 6:09 PM Benjamin Asher  wrote:
>
> Hi there! My colleagues and I are trying to understand the role of ON 
> constraints vs WHERE clauses in JOINs. It seems both of the following work, 
> but I’m not really sure why:
>
> Query A:
>
> SELECT * FROM tab1 LEFT JOIN tab2 ON tab1.x=‘constant’;
>
>
> Query B:
>
> SELECT * FROM tab1 LEFT JOIN tab2 ON tab1.x=tab2.x WHERE tab1.x=‘constant’;
>
> Is there a difference between the two (function and performance)? Is there an 
> advantage to putting WHERE-type filtering in the ON constraint vs leaving it 
> in the WHERE clause for LEFT JOINs?
>
> Thanks!
>
> Ben
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Standard deviation last x entries

2019-10-20 Thread Gabor Grothendieck
Another approach is to question whether you really need the
standard deviation or if any measure of variation would do.

You could use the range, max(x) - min(x) , and just report it as the range
rather than the standard deviation.  Also range/4 approximates the
standard deviation (google the range rule)
although the approximation is not very good.



On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck
 wrote:
>
> There is a stdev function for sqlite here:
> https://www.sqlite.org/contrib//download/extension-functions.c?get=25
>
> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt  wrote:
> >
> > Am 12.10.2019 um 16:47 schrieb Bart Smissaert:
> > > Sorry, I forgot to tell that. It is date column with an integer number.
> > >
> > > ID xValue xDate
> > > 
> > > 1   130  40123
> > > 1   120  41232
> > > 1   140  40582
> > > 1   100  40888
> > > 1   110  42541
> > > 2   140  41225
> > > 2   130  41589
> > > 2   150  40872
> >
> > Because SQLite on Android has at least CTEs available,
> > (since Android 5 or so), I thought I make an attempt using them...
> >
> > For your above sample-set I've created a table Test this way
> > (with slightly changed Column-Names + different IDs with "gaps"
> > + a few more sample-records, to cover especially n=1 sets):
> >
> > Create Table Test(ID, xVal, xDat);
> >
> > Insert Into Test Values(3, 130, 40123);
> > Insert Into Test Values(3, 120, 41232);
> > Insert Into Test Values(3, 140, 40582);
> > Insert Into Test Values(3, 100, 40888);
> > Insert Into Test Values(3, 110, 42541);
> >
> > Insert Into Test Values(5, 140, 41225);
> > Insert Into Test Values(5, 130, 41589);
> > Insert Into Test Values(5, 150, 40872);
> >
> > Insert Into Test Values(6, 110, 41225);
> > Insert Into Test Values(6, 115, 41227);
> >
> > Insert Into Test Values(9, 105, 41225);
> >
> > After the above Table-Creation + Inserts, one should
> > define a "Base-View" first, which is then able to select
> > appropriate aggregates (respecting your "max-4" condition):
> >
> > Create View vw_StdDev_Max4_Aggregates As
> > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T
> > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc
> > Limit 4)
> > Group By ID;
> >
> > Only this view above has to be adjusted, according to your
> > real table- and column-names, but the following two "derived views",
> > will work after these changes as well, as long as you leave the
> > above view-name - and the field-names: (ID, a1, a2, n) intact.
> >
> > So what remains, is two more view-defs (one for the population-StdDev -
> > and one with a few slight changes, to cover the sample-StdDev as well.
> >
> > Create View vw_StdDev_Samples As
> > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", (
> >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All
> >  Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> > Limit 32
> >) Select s From r Order By i Desc Limit 1
> > ) s From vw_StdDev_Max4_Aggregates Where n>1
> > Union All
> > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1;
> >
> > Create View vw_StdDev_Populations As
> > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", (
> >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All
> >  Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> > Limit 32
> >) Select s From r Order By i Desc Limit 1
> > ) sigma From vw_StdDev_Max4_Aggregates Where n>1
> > Union All
> > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1;
> >
> > 
> > Ok, here the result I get on my test-table, when the 3 views
> > are in place, and the following gets executed:
> >
> > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations;
> > ID  n   sigma
> > 
> > 3   4   14.7901994577491
> > 5   3   8.16496580927733
> > 6   2   2.5
> > 9   1   0
> >
> >
> > Select ID, n_SampleSize, s From vw_StdDev_Samples;
> > ID  n   s
> > 
> > 3   4   17.0782512765993
> > 5   3   10.1
> > 6   2   3.53553390593274
> > 9   1   null
> >
> > Please note the returned null (vs. the real 0) in the last result
> > (done, because "unbiased estimates"

Re: [sqlite] Standard deviation last x entries

2019-10-18 Thread Gabor Grothendieck
There is a stdev function for sqlite here:
https://www.sqlite.org/contrib//download/extension-functions.c?get=25

On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt  wrote:
>
> Am 12.10.2019 um 16:47 schrieb Bart Smissaert:
> > Sorry, I forgot to tell that. It is date column with an integer number.
> >
> > ID xValue xDate
> > 
> > 1   130  40123
> > 1   120  41232
> > 1   140  40582
> > 1   100  40888
> > 1   110  42541
> > 2   140  41225
> > 2   130  41589
> > 2   150  40872
>
> Because SQLite on Android has at least CTEs available,
> (since Android 5 or so), I thought I make an attempt using them...
>
> For your above sample-set I've created a table Test this way
> (with slightly changed Column-Names + different IDs with "gaps"
> + a few more sample-records, to cover especially n=1 sets):
>
> Create Table Test(ID, xVal, xDat);
>
> Insert Into Test Values(3, 130, 40123);
> Insert Into Test Values(3, 120, 41232);
> Insert Into Test Values(3, 140, 40582);
> Insert Into Test Values(3, 100, 40888);
> Insert Into Test Values(3, 110, 42541);
>
> Insert Into Test Values(5, 140, 41225);
> Insert Into Test Values(5, 130, 41589);
> Insert Into Test Values(5, 150, 40872);
>
> Insert Into Test Values(6, 110, 41225);
> Insert Into Test Values(6, 115, 41227);
>
> Insert Into Test Values(9, 105, 41225);
>
> After the above Table-Creation + Inserts, one should
> define a "Base-View" first, which is then able to select
> appropriate aggregates (respecting your "max-4" condition):
>
> Create View vw_StdDev_Max4_Aggregates As
> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T
> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc
> Limit 4)
> Group By ID;
>
> Only this view above has to be adjusted, according to your
> real table- and column-names, but the following two "derived views",
> will work after these changes as well, as long as you leave the
> above view-name - and the field-names: (ID, a1, a2, n) intact.
>
> So what remains, is two more view-defs (one for the population-StdDev -
> and one with a few slight changes, to cover the sample-StdDev as well.
>
> Create View vw_StdDev_Samples As
> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", (
>With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All
>  Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> Limit 32
>) Select s From r Order By i Desc Limit 1
> ) s From vw_StdDev_Max4_Aggregates Where n>1
> Union All
> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1;
>
> Create View vw_StdDev_Populations As
> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", (
>With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All
>  Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> Limit 32
>) Select s From r Order By i Desc Limit 1
> ) sigma From vw_StdDev_Max4_Aggregates Where n>1
> Union All
> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1;
>
> 
> Ok, here the result I get on my test-table, when the 3 views
> are in place, and the following gets executed:
>
> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations;
> ID  n   sigma
> 
> 3   4   14.7901994577491
> 5   3   8.16496580927733
> 6   2   2.5
> 9   1   0
>
>
> Select ID, n_SampleSize, s From vw_StdDev_Samples;
> ID  n   s
> 
> 3   4   17.0782512765993
> 5   3   10.1
> 6   2   3.53553390593274
> 9   1   null
>
> Please note the returned null (vs. the real 0) in the last result
> (done, because "unbiased estimates" are undefined for sample-size == 1)
>
> Finally for those interested, the CTE-based "iterated Square-Root",
> cut out for an isolated calculation of sqrt(2).
>
> With r(s2, s, i) As (Select 2, 1, 1 Union All
>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> Limit 32
> ) Select s From r Order By i Desc Limit 1
>
> The max. iterations are limited to 32 - though for input-values
> in a "typical range", the iteration will usually exit earlier...
> In case of the example above -> sqrt(2), the iteration-counter
> i reached only 6 (as can be seen, when you include i alongside s
> in the final Select.
>
> With r(s2, s, i) As (Select 2, 1, 1 Union All
>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> Limit 32
> ) Select i, s From r Order By i Desc Limit 1
>
> HTH
>
> Olaf
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-12 Thread Gabor Grothendieck
Thanks for the idea but the interface already handles that and does it
without special names. The last example in my last post shows that
d was correctly typed in the output because the interface noticed that
it had the same name as an input column.  Other problems are that it
would still not handle propagation through expressions and would require
that the user use special names different than the names in the
input.

I appreciate these ideas but these or equally effective alternatives are
already implemented and it is precisely these kludges that I was
trying to avoid.  With one R statement the user can switch back ends
so unless sqlite works as smoothly as the alternative backends
a user will choose one of those if they are doing a lot of date and
datetime processing.

For many applications the other advantages
of sqlite would take precedence.
The fact that sqlirte is included right in
the R driver package is very convenient as it means there is nothing
additional to install
beyond the R driver.  (H2 is also included in the R driver but in that
case java needs to be installed.) Also the new windowing functions, CTEs
and other features are great.
Unfortunately in the widely applicable case of dates and date times
the other databases just work and additional care needs to be taken
with sqlite.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-12 Thread Gabor Grothendieck
The whole point of this is to make it as easy as possible for the user.
With other backends the database handles the types but with sqlite
the user has to get involved.

It is not a matter of storage.  It is a matter of maintaining the type
information
on the database side and passing the type information back to R.

Perhaps an example would help.  Consider this.  Here R passes the
type information to the H2 database backend and H2 passes it back to R
so that the output is correctly typed.  This works as desired.

  library(RH2) # load H2 driver and H2 database
  library(sqldf) # load package

  DF <- data.frame(d = as.Date("2000-01-01"))  # create data.frame
with one column d

  sqldf("select d, d + 1 as nextDay from DF")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

Now if we switch from H2 to sqlite that would have not have worked.
Any of these workaournds
are currently implemented and work but are not optimal.

  library(sqldf)  # if no driver loaded it assumes sqlite

  # Alternative 1
  # method = "Date" means all returned numbers are regarded as Date

  sqldf("select d, d + 1 as nextDay from DF", method = "Date")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

 # Alternative 2
  # this is similar to what Simon suggested.
  # method = "name__class" means use the column name suffix
  # to determine the type

  sqldf("select d as d__Date, d + 1 as nextDay__Date from DF", method
= "name__class")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

 # Alternative 3
  # convert it manually.  It deduces that d is Date because an input
  # column of the same name exists as Date but that is not foolproof
  # and does not, for example, work for the nextDay column so we
  # convert it manually

  out <- sqldf("select d, d + 1 as nextDay from DF")
  out$nextDay <- as.Date(out$nextDay, origin = "1970-01-01")
  out
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

On Sun, Aug 11, 2019 at 7:16 PM Keith Medcalf  wrote:
>
>
> On Sunday, 11 August, 2019 07:45, Gabor Grothendieck 
>  wrote:
>
> >R supports Date and POSIXct (date/time) classes which are represented
> >internally as days and seconds since the UNIX Epoch respectively;
> >however, due to the class it knows to display and manipulate them as
> >dates and datetimes rather than numbers.
>
> SQLite3 can store integers and doubles.  Seems to me this is quite sufficient 
> for storing a "number of seconds" since the Unix Epoch or "number of days" 
> since the Unix epoch.  What exactly is the problem?  An IEEE-754 double 
> precision floating point number is certainly capable of storing all offsets 
> from the Unix Epoch (either as a days or seconds offset) with far more 
> precision that a human is likely able to discern (or with more precision than 
> the accuracy of most atomic clocks, for that matter).
>
> >If sqldf sends a Date or POSIXct to SQLite then it is sent as a
> >number (days or seconds since the UNIX Epoch) but when it is sent
> >back it cannot know that that number is supposed to represent a
> >date or datetime.
>
> Why not?  The column type declaration in the table is merely an "arbitrary 
> string", and the returned column names from a select are merely "arbitrary 
> strings".  It seems like an "application deficiency" that it cannot set and 
> retrieve "arbitrary metadata" for its own internal use (such as providing 
> column type declarations in the CREATE TABLE or additional metadata in the 
> column name (in select statements).  Many other "applications" do so without 
> difficulty (for example the default sqlite3 wrapper in Python).  Are the R 
> programmers somehow particularly deficient in this regard?
>
> I quite often store "timestamps" using application dependent epochs and 
> offsets without difficulty (for example, the number of one-minute intervals 
> since the Unix epoch).  Given that just about every programming system and 
> language ever invented seems to store datetime data as some type of offset 
> from some epoch, and each of them different, does not seem to bother 
> interoperability in the least.  Once you know the Epoch and Offset interval, 
> conversion is rather simple arithmetic that most children learned in primary 
> school.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-11 Thread Gabor Grothendieck
Actually sqldf has multiple heuristics and the one you suggested is
already one of them
(except for minor differences in syntax) but this has the disadvantage
that the user must
specify classes whereas if the user simply uses any of the other backends
 they don't have to.

On Sun, Aug 11, 2019 at 10:52 AM Simon Slavin  wrote:
>
> On 11 Aug 2019, at 2:45pm, Gabor Grothendieck  wrote:
>
> > R supports Date and POSIXct (date/time) classes which are represented
> > internally as days and seconds since the UNIX Epoch respectively;
> > however, due to the class it knows to display and manipulate them as
> > dates and datetimes rather than numbers.
> >
> > If sqldf sends a Date or POSIXct to SQLite then it is sent as a number 
> > (days or seconds since the UNIX Epoch) but when it is sent back it cannot 
> > know that that number is supposed to represent a date or datetime.   There 
> > are some default heuristics (if any of the input tables have a column name 
> > the same name as an output column name then such output columns are 
> > automatically converted) but this is far from foolproof.
>
> Add a new heuristic that if the column name ends in '_Date' or '_
> POSIXct' then the value should be converted.  When R creates a new SQLite 
> table, it should append '_Date' or '_POSIXct' to the column name it chooses 
> for classes of those types.  Existing heuristics should continue to work.
>
> I do understand the value of having date/time types in SQLite, but it is not 
> easy to do while retaining backward compatibility.  It'll have to wait for 
> SQLite4 or something.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] dates, times and R

2019-08-11 Thread Gabor Grothendieck
It's really useful that SQLite now supports window operations as that was
one of the key features that R users needed to do many manipulations
using SQLite.

From the perspective of the R language there is really one
particularly key feature left that prevents some users from easily
using SQLite in many cases.  That is the lack of date and time types.

From the perspective of the sqldf package in R, the user writes:

sqldf("select ...whatever...")

and sqldf creates an empty sqlite data base, looks for all table names
in the select statement, generates create table statements for them,
uploads the tables to the new database, runs the statement, downloads
the result and deletes the database. (This package uses a lower level
R driver package for sqlite which may also be used directly.)

R supports Date and POSIXct (date/time) classes which are represented
internally as days and seconds since the UNIX Epoch respectively;
however, due to the class it knows to display and manipulate them as
dates and datetimes rather than numbers.

If sqldf sends a Date or POSIXct to SQLite then it is sent as a number (days or
seconds since the UNIX Epoch) but when it is sent back it cannot know that
that number is supposed to represent a date or datetime.   There are some
default heuristics (if any of the input tables have a column name the same name
as an output column name then such output columns are automatically
converted) but this is far from foolproof.

This is not a problem for the other backend databases  that are
supported since those backends have true date and datetime types so
when R receives such objects it knows to convert them to R's similar types.
Currently SQLite is the default backend but I normally recommend that users
switch to the java based H2 backend if they are doing a lot of date and datetime
processing for the above reason. A lot of data analysis does involve
dates and date times so this covers a lot of applications.

If SQLite were to support true date and datetime types, as do other databases,
and not just functions which handle numbers as if they were dates or
datetimes this problem would be addressed so this is a key remaining feature
that I think SQLite needs.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple way to import GPX file?

2018-12-09 Thread Gabor Grothendieck
The csvfix command line utility is useful to edit such files down to a csv file
which can then be read into sqlite.

  csvfix from_xml -smq -re wpt gpx.xml

On Sun, Dec 9, 2018 at 4:44 PM Winfried  wrote:
>
> Hello,
>
> I need to importe a GPX file that contains a few thousand waypoints, eg.
> Some name
>
> I tried https://mygeodata.cloud and https://geoconverter.hsr.ch, but for
> some reason, the "waypoints" table doesn't contain latitude + longitude
> infos.
>
> Before I write a Python script, is there a simple way to import GPX data
> into SQLite?
>
> Thank you.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-02-02 Thread Gabor Grothendieck
> It is a standard feature of pysqlite2 (the sqlite3 library shipped with 
> Python).  I am quite sure you can read the documentation just as well as I 
> can copy and paste it.  In short, you can use anything you like as the data 
> type in sqlite.  You could specify that some column contains:
>
> create table Developers
> (
>   count sillyGooses integer not null
> );
>
> You have now defined this to be of type sillyGooses integer.  It will have 
> affinity integer.  WHen you read the column metadata you can find out that 
> count is of type sillygooses and multiply the returned value count by Pi/e if 
> that is the conversion factor required.
>
> THe person writing the interface can even do this magically for you -- just 
> like pysqlite / sqlite3 python libraries do.
>

It's hard to tell what you are suggesting without the full example I
asked for but given that

1. you are showing a create statement and
2. refering to the "person writing the interface"

I assume  you are suggesting that the user define the structure of the
output table sent from sqlite and do that either on the python or
sqlite side but that was what we were trying to avoid.

Of course it's possible (and easy to do in R too) if the user is
required to specify the output types on the sqlite or R side but in R
you don't have to -- it's all done automatically for you.  In the full
R example I showed there is no create statement or other statements to
define an the "interface" that I did not show yet it works in
databases that support date types but not in sqlite (unless you rely
on heuristics that the sqldf package provides).


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-02-01 Thread Gabor Grothendieck
On Mon, Feb 1, 2016 at 8:27 AM, Keith Medcalf  wrote:
>
>> > But you already have pandas.read_sql_query.  While that function
>> > isn't really what I'd call simple, the complexity afaict -- dates,
>> > floats, and chunks -- can be laid at Python's feet.
>>
>> I use R rather than python but the problem of dates is significant and
>> I assume the same problem as in python.  Consider this query:
>>
>>  select myDate + 1 from myTable
>>
>> Assume that myTable comes from an R data frame with a myDate column
>> that has R class of "Date".  Internally R stores the myDate column as
>> days since 1970-01-01 and since SQLite has no Date class it sends that
>> number to SQLite.  The select statement then adds 1 giving a number
>> representing the next day but when one gets it back to R how can R
>> know that that number is intended to represent a Date and so convert
>> it to R's Date class?  In fact it can't.   With databases that have a
>> Date type this is not a problem but it is a significant problem with
>> SQLite.
>
> What is the problem exactly?  Sounds like the R thingy is broken.  I do this 
> all the time using Python.  You simply have to compile the library with 
> column_metadata and then use it.  Sheesh.  Even pysqlite has done this just 
> peachyu fine for about a decade.
>

> There are some hackish workarounds.  For example, consider this
>> self-contained reproducible R code:
>
>> library(sqldf)  # load package and dependencies
>> myTable <- data.frame(myDate = as.Date("2000-01-31"))  # define myTable
>> sqldf("select myDate + 1 myDate from myTable")
>
>> sqldf will use the heuristic of assuming that the myDate in the output
>> has the same class as the myDate in the input (because they have the
>> same name) and so will convert the output myDate column to the R
>> "Date" class but the ability of such a heuristic to work is fairly
>> limited.
>
>> It is also possible to specify to sqldf the class of each output
>> column but this is tedious and puts SQLite at a disadvantage since it
>> is unnecessary if used with a different backend database that is able
>> to return a Date class column.
>
>> With numeric and character columns there is not much problem but as
>> soon as one gets to date and date/time columns then this impedence
>> mismatch appears and is one fo the main reasons an R user might decide
>> to use a different backend.
>
> There is no impedence mismatch.  Simply inadequate wattage by the person(s) 
> solving the problem.  As I said, this problem has been solved with SQLite and 
> Python for a decade.  So I would suggest the problem is that the wattage was 
> so low, the lights were completely out.
>

The impedence in the example is that on the sqlite side the type is
numeric and on the R side it is of Date class.  It is impossible to
know which sqlite numeric types are intended to represent dates and
which are intended to represent numbers so when they are sent back to
R there is no automatic assignment to Date class.  Either the
programmer has to specify it (which is tedious) or else one has to
rely on heuristics such as assuming that any output field having the
same name as input field must also have the same type (but such
heuristics do not cover all cases).

If you believe that python can know that a date is being returned from
sqlite without the programmer specifying it even though sqlite has no
date class, I suggest providing a self contained reproducible example
of python code to illustrate it as I did with R.


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-31 Thread Gabor Grothendieck
On Sun, Jan 31, 2016 at 6:25 PM, James K. Lowden
 wrote:
> On Sat, 30 Jan 2016 20:50:17 -0500
> Jim Callahan  wrote:
> But you already have pandas.read_sql_query.  While that function
> isn't really what I'd call simple, the complexity afaict -- dates,
> floats, and chunks -- can be laid at Python's feet.

I use R rather than python but the problem of dates is significant and
I assume the same problem as in python.  Consider this query:

 select myDate + 1 from myTable

Assume that myTable comes from an R data frame with a myDate column
that has R class of "Date".  Internally R stores the myDate column as
days since 1970-01-01 and since SQLite has no Date class it sends that
number to SQLite.  The select statement then adds 1 giving a number
representing the next day but when one gets it back to R how can R
know that that number is intended to represent a Date and so convert
it to R's Date class?  In fact it can't.   With databases that have a
Date type this is not a problem but it is a significant problem with
SQLite.

There are some hackish workarounds.  For example, consider this
self-contained reproducible R code:

library(sqldf)  # load package and dependencies
myTable <- data.frame(myDate = as.Date("2000-01-31"))  # define myTable
sqldf("select myDate + 1 myDate from myTable")

sqldf will use the heuristic of assuming that the myDate in the output
has the same class as the myDate in the input (because they have the
same name) and so will convert the output myDate column to the R
"Date" class but the ability of such a heuristic to work is fairly
limited.

It is also possible to specify to sqldf the class of each output
column but this is tedious and puts SQLite at a disadvantage since it
is unnecessary if used with a different backend database that is able
to return a Date class column.

With numeric and character columns there is not much problem but as
soon as one gets to date and date/time columns then this impedence
mismatch appears and is one fo the main reasons an R user might decide
to use a different backend.


[sqlite] Best way to store only date

2016-01-30 Thread Gabor Grothendieck
I frequently have to deal with dates coming from R's "Date" class
which stores dates as the number of days since the UNIX epoch.  So if
x is the number of days since 1970-01-01 then this gives the
-mm-dd representation of the date

date(x + 2440588)

and (annoyingly owing to the need for the 0.5) this is the inverse
(i.e. if y is the result of the above expression then this returns x):

julianday(y) - 2440588 + 0.5

The 2440588 comes from the following but it's a bit verbose when you
are dealing with a lot of dates to keep writing this out:

   select julianday(date(0, "unixepoch"))


On Sat, Jan 30, 2016 at 9:31 AM, E.Pasma  wrote:
> 30-01-2016 14:59, R Smith:
>
>>
>>
>> On 2016/01/30 3:22 PM, E.Pasma wrote:
>>> The diagram got broken in my email and here is another try:
>>>
>>> Needs to be light | Needs to be| Needs to do  |
>>> (small footprint) | Human-Readable | calculations |
>>> - | ---|  |
>>> YES   | YES| NO   | Integer as
>>> ||  | Igor's suggestion
>>> ||  |
>>> YES   | NO | YES  | Float/Int
>>> ||  | Julianday
>>> ||  |
>>> NO| YES| YES  | Datetime/Numeric
>>> ||  | ISO Standard
>>
>> Thank you for the fix.
>>
>>>
>>> With respect to Igor's suggestion, mmdd (as integer), why not leave out
>>> the century? I prefer the oldfashoned yymmdd.
>>
>> When dealing with a localized context around the current period, a two
>> digit date is often enough - so if you see '12 or '16 or '20 you can
>> easily assume that to mean 2012, 2016 etc. But what if you see '51?
>> Would that be 1951 or 2051?
>> The context would probably enlighten the meaning, but it's best to leave
>> context to the users and not pre-empt it during the design phase. A
>> four-digit year is best for standard human reference. (You won't need to
>> also add the AD. bit)  :)
>>
> My private use is adding yymmdd as an extension to document names on my
> computer. You can leave it to me to know what 550501 refers to.
> Professionally this sounds very bad, agreed with the others. But a system
> may deal with a two-digit date according to clear rules. See the Oracle RR
> date format
>
>  http://oracleeducation.blogspot.nl/2007/05/oracle-date-format_22.html
>
> But I may bring this up after a couple of decades again :-)
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


[sqlite] Parsing the contents of a field

2016-01-13 Thread Gabor Grothendieck
If it's OK to use the sqlite3 cmd line shell then try this:

-- create test input table X
create table X (what text);
insert into X values ('abc,def');

-- write X to a file
.output mydata.csv
select * from X;
.output stdout

-- read it back in to parse it
create table Y (a text, b text);
.mode csv
.import mydata.csv Y

-- check Y - add a limit clause if Y is large
select * from Y;


On Wed, Jan 13, 2016 at 12:42 AM, audio muze  wrote:
> I have a table of roughly 500k records with a number of fields
> containing delimited text that needs to be parsed and written to
> separate tables as a master lists.  In order to do this I need to
> parse the field contents, however, I don't see any functions within
> SQLite to enable that.  The number of delimited entries embedded in a
> field can vary from none to as man as 20/30.  Is there an addin I can
> compile with SQLite that provides the ability to parse a string?
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-08 Thread Gabor Grothendieck
This is how R works too.  That is the RSQLite package that gives
access to SQLite includes SQLite itself right in the package itself so
one need not separately install SQLite.

Also RSQlite uses the R DBI package which defines connections as
classes which are subclassed by the various database packages
(RSQLite, RMySQL, RPostgreSQL, etc.). Thus, in the case of RSQLite it
defines a connection subclass whose methods allow one to access
SQLite.

R also has the higher level sqldf package that lets one access R
data.frames (R data frames are like memory resident database tables)
as if they were tables in a relational database.  When sqldf is run it
creates an SQLite database (or other backend that might be used but
the default is SQLite), uploads any data frames referenced in the SQL
statement, performs the SQL statements and downloads the result
destroying the database.  SQLite is sufficiently fast that this is
often faster than performing the same operation in native R despite
having to upload the inputs and download the output.

For example, the following installs sqldf and its dependencies
(RSQLite, DBI) on the first line, loads them all into the current
session's workspace in the second line and then lists the first 4 rows
of the iris data frame (iris comes with R) using SQLite as the backend
and then defines a data.frame DF and performs another SQL statement:

install.packages("sqldf")
library(sqldf)

sqldf("select * from iris limit 4")

DF <- data.frame(a = 1:26, b = LETTERS)
sqldf("select * from DF where a > 10 limit 3")



On Fri, Jan 8, 2016 at 2:51 AM, Darren Duncan  
wrote:
> Stephen,
>
> What you are arguing for (no shared libraries) is bad old days where one had
> to recompile their programming language to add support for a DBMS, rather
> than the DBMS support being a separately installable library that one could
> choose to install or not or upgrade semi-independently or not, or choose to
> use an alternative or not.
>
> Sure, SQLite is public domain, but why should every language bundle it into
> their core just because?  There are lots of other useful libraries one could
> make the same argument for.  Bundling it can make sense if the language core
> itself depends on SQLite or practically all of its users would use it, but
> that's not usually the case.
>
> I should also point out that the standard Perl interface for SQLite, the
> DBD::SQLite module, bundles the SQLite source with it, so installing that
> Perl library gives you SQLite itself, there are no DLLs or dependence on
> some system SQLite library, but Perl itself doesn't have this built-in nor
> should it.
>
> In the Perl 4 days you had to recompile Perl to make a version that can talk
> to a DBMS, eg "Oraperl", but thankfully with Perl 5 (1994 or so) we did away
> with that.
>
> -- Darren Duncan
>
>
> On 2016-01-07 5:47 PM, Stephen Chrzanowski wrote:
>>
>> I personally wish the reverse.  I wish that these interpreted language
>> engines would incorporate the SQLite code directly into their own
>> existence
>> to avoid having to write wrappers to begin with, except for those wrappers
>> where their method name is "DatabaseOpen" and I prefer "OpenDatabase".
>>
>> SQLite has been around for years, and "R", PHP, Java, Perl, and all these
>> other interpreted new and old style languages have never bothered to
>> incorporate this public domain database engine within itself.  It isn't
>> like the maintainers of these languages don't know it doesn't exist, and
>> if
>> they didn't, then my god they gotta get out from under that rock.  Most
>> web
>> browsers use SQLite for crying out loud.
>>
>> For a few years, I've considered taking the entire amalgamation and
>> porting
>> it to Pascal (Delphi/FPC) so I have exactly zero reliance on DLLs.  No
>> worries about OBJ files, no worries about dependencies, I just include a
>> unit and my app is now database aware.  I know 386 assembly, and I can
>> always read up on other specifications if I needed to.  My problem is that
>> gaming gets in the way.
>>
>> My 2016 wish list for SQLite is that all developers who write for, or use
>> directly or indirectly, any database engine out on the market has a safe
>> and happy 2016 and beyond.
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


[sqlite] SQLite Extensions

2015-11-20 Thread Gabor Grothendieck
Can you include in the distribution the precise code (make file or
.bat file, etc.) used to build these.  Thanks.

On Fri, Nov 20, 2015 at 12:42 AM, Keith Medcalf  wrote:
>
> For anyone who is interested, I have compiled for 32-bit Windows all the 
> extensions that are included in the SQLite3 distribution /ext/misc directory 
> using MinGW (gcc 4.8.1) and have added a few others I have written that add 
> useful functions, and even a few that are taken from other people -- see the 
> source for information.  They are all statically linked, require only the 
> standard subsystem runtime (MSVCRT) and standard windows system dlls.  There 
> are no other external dependancies.
>
> The modules that I wrote are:
>
> sqlfunc - Useful running statistical calculations, windows authorization 
> functions
> sqlmath - Access to underlying math library (sin/cos/tan etc) as sql functions
> sqlfcmp - functions to compare floating point numbers (default within 5 ULPs)
> sqlhash - windows cryptographic hash functions 
> (md2/md4/md5/sha/sha256/sha384/sha512)
>
> You can download the file at http://www.dessus.com/files/SQLiteExtensions.zip
>
> ---
> Life should not be a journey to the grave with the intention of arriving 
> safely in a pretty and well preserved body, but rather to skid in broadside 
> in a cloud of smoke, thoroughly used up, totally worn out, and loudly 
> proclaiming "Wow! What a Ride!"
>  -- Hunter S. Thompson
>
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


[sqlite] Is it legal SQL to select a specific row of a group via a HAVING clause?

2015-09-29 Thread Gabor Grothendieck
On Tue, Sep 29, 2015 at 6:16 AM, Simon Slavin  wrote:

> I don't like using sub-SELECT and I would love to see another reader
> rephrase this using 'WITH' or a VIEW.
>

with sub as (select currency, price, max(day) from prices group by currency)
 select currency, price from sub;


[sqlite] CSV excel import

2015-08-01 Thread Gabor Grothendieck
Here is how to do it in R.  Download, install and start R and then paste
the following code into R.

Uncomment the first line (the line starting with  a hash) if you don't
already have sqldf installed.  This not only installs sqldf but also the
RSQLite driver and SQLite itself.

The code assumes that mytable.csv is the input file, DF is the table name
to create and db  is the name of the SQLite database to use (or create).
Change these as needed.   It will use the first line of the input file as
the column names and will automatically determine the types of  columns by
examining the first few data rows of the input.

# install.packages("sqldf")

library(sqldf)
DF <- read.csv("mytable.csv")
sqldf(c("attach db as new", "create table new.DF as select * from DF"))


On Thu, Jul 30, 2015 at 1:58 PM, Sylvain Pointeau <
sylvain.pointeau at gmail.com> wrote:

> I understood from the mailing list, that CSV is not a defined format, then
> let's propose another format, well defined, the Excel one (which is in my
> experience a format to is good every time I had to exchange CSV files).
>
> Then why don't you propose an import of CSV from Excel (or similar)?
> csv(excel)
>
> is it possible? in a lot of cases, I cannot use sqlite (executable) because
> of the lack of a good CSV import. It would really great if this could be
> addressed.
>
> Best regards,
> Sylvain
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


[sqlite] Thanks SQLite

2015-07-31 Thread Gabor Grothendieck
Both the solutions transformed the correlated subquery into a join prior to
forming the CTE.  Can we conclude, in general, that CTEs do not support
correlated subqueries?

On Fri, Jul 31, 2015 at 11:30 AM, R.Smith  wrote:

>
>
> On 2015-07-31 03:40 PM, Gabor Grothendieck wrote:
>
>> On Fri, Jul 31, 2015 at 8:40 AM, Simon Slavin 
>> wrote:
>>
>> I am noting an overlap with the things SQLite users have been using
>>> sub-SELECTs for.
>>>
>>>
>>> Here is a self contained example that can be fed into the sqlite3 command
>> line tool.  It uses WITH to factor out the subquery; however, the annoying
>> part about it is that even though we have factored out the subquery we
>> still have to include a simplified version of the subquery, viz.
>>(select avgSalary from A)
>> It would have been nice if we could just replace (select avgSalary from A)
>> with (A.avgSalary) or even (A) but those substitutions do not work:
>>
>
> Well, you can, sort-of, if you use the CTE column naming. Consider this
> simplified version of your Query (also using the CTE to do the actual
> aggregate work once only):
>
>create table Emp (emp text, salary real, dept text);
>insert into Emp values
>('a', 1, 'A'),
>('b', 2, 'A'),
>('c', 3, 'A'),
>('d', 1, 'B'),
>('e', 2, 'B'),
>('f', 3, 'B');
>
>
>WITH A(dept,avgSalary) as (
> SELECT Emp.dept, AVG(salary) FROM Emp GROUP BY Emp.dept
>)
>SELECT Emp.*
>   FROM Emp, A
>  WHERE Emp.dept = A.dept AND Emp.salary > A.avgSalary;
>
>
>   --  emp  | salary |  dept
>   -- - | -- | -
>   --   c   |   3.0  |   A
>   --   f   |   3.0  |   B
>
>
> -- based on query at: https://en.wikipedia.org/wiki/Correlated_subquery
>> create table Emp (emp text, salary real, dept text);
>> insert into Emp values ('a', 1, 'A');
>> insert into Emp values ('b', 2, 'A');
>> insert into Emp values ('c', 3, 'A');
>> insert into Emp values ('d', 1, 'B');
>> insert into Emp values ('e', 2, 'B');
>> insert into Emp values ('f', 3, 'B');
>> WITH A as
>>   (SELECT AVG(salary) AS avgSalary FROM Emp WHERE dept = e1.dept)
>>   SELECT *
>>   FROM Emp AS e1
>>   WHERE salary > (select avgSalary from A);
>>
>> Here is a sample run assuming the above has been placed into a file named
>> test.sqlite.
>>
>> C:\> sqlite3 < test.sqlite
>> c|3.0|A
>> f|3.0|B
>>
>> (I am using sqlite 3.8.3 on Windows 8.1.)
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Thanks SQLite

2015-07-31 Thread Gabor Grothendieck
On Fri, Jul 31, 2015 at 8:40 AM, Simon Slavin  wrote:

>
> I am noting an overlap with the things SQLite users have been using
> sub-SELECTs for.
>
>
Here is a self contained example that can be fed into the sqlite3 command
line tool.  It uses WITH to factor out the subquery; however, the annoying
part about it is that even though we have factored out the subquery we
still have to include a simplified version of the subquery, viz.
  (select avgSalary from A)
It would have been nice if we could just replace (select avgSalary from A)
with (A.avgSalary) or even (A) but those substitutions do not work:

-- based on query at: https://en.wikipedia.org/wiki/Correlated_subquery
create table Emp (emp text, salary real, dept text);
insert into Emp values ('a', 1, 'A');
insert into Emp values ('b', 2, 'A');
insert into Emp values ('c', 3, 'A');
insert into Emp values ('d', 1, 'B');
insert into Emp values ('e', 2, 'B');
insert into Emp values ('f', 3, 'B');
WITH A as
 (SELECT AVG(salary) AS avgSalary FROM Emp WHERE dept = e1.dept)
 SELECT *
 FROM Emp AS e1
 WHERE salary > (select avgSalary from A);

Here is a sample run assuming the above has been placed into a file named
test.sqlite.

C:\> sqlite3 < test.sqlite
c|3.0|A
f|3.0|B

(I am using sqlite 3.8.3 on Windows 8.1.)


[sqlite] PhD student

2015-02-26 Thread Gabor Grothendieck
On Wed, Feb 25, 2015 at 11:28 AM, VASILEIOU Eleftheria
 wrote:
> I would need to use R for my analysis for my Project and my supervisor 
> suggested me to learn the SQL language for R.
> Could you please provide me some resources for learning SQL and R?

Assuming you are looking to use SQL to work with R data.frames see
this link for numerous examples:

http://sqldf.googlecode.com


[sqlite] Appropriate Uses For SQLite

2015-02-19 Thread Gabor Grothendieck
On Wed, Feb 18, 2015 at 9:53 AM, Richard Hipp  wrote:
> On 2/18/15, Jim Callahan  wrote:
>> I would mention the open source statistical language R in the "data
>> analysis" section.
>
> I've heard of R but never tried to use it myself.  Is an SQLite
> interface built into R, sure enough?  Or is that something that has to
> be added in separately?
>

RSQLite is an add-on package to R; however, for data analysis (as
opposed to specific database manipulation) I would think most R users
would use my sqldf R add-on package (which uses RSQLite by default and
also can use driver packages of certain other databases) rather than
RSQLite directly if they were going to use SQL for that.

In R a data.frame is like an SQL table but in memory and sqldf lets
you apply SQL statements to them as if they were all one big SQLite
database.  A common misconception is it must be slow but in fact its
sufficiently fast that some people use it to get a speed advantage
over plain R.  Others use it to learn SQL or to ease the transition to
R and others use it allow them to manipulate R data frames without
knowing much about R provided they know SQL.

If you have not tried R this takes you through installing R and
running sqldf in about 5 minutes:
https://sqldf.googlecode.com/#For_Those_New_to_R

The rest of that page gives many other examples.


Re: [sqlite] Whish List for 2015

2014-12-23 Thread Gabor Grothendieck
On Sun, Dec 21, 2014 at 4:47 AM, big stone  wrote:
> Hi all,
>
> To prepare for  end of 2014 greetings moment, here is my whish list for
> 2015:
> - a minimal subset of analytic functions [1], that I hope may help
> end-user/students popularity [2]
> - better information on what is coming ahead, for example:
>. I see the 'sessions' tree moving along main tree since a few months,
> what is it about ?
>. sqlite4 is dead because sqlite3 did progress quicker than expected ?
>. 
>
> [1] http://www.postgresql.org/docs/9.4/static/tutorial-window.html
>
> [2] http://db-engines.com/en/ranking_trend
>

In R the sqldf R package allows one to perform SQL operations on R
data.frames by transparently uploading them to an SQL database which
it creates on the fly performing the specified SQL operation,
reeturning the result and destorying the database it created.  it uses
SQLite by default but can also use H2, MySQL and PostgreSQL.

Its my sense that users of sqldf use SQLite because its the default
and is automatically installed when they install sqldf but if they
move to one of the other databases the first choice is typically H2
because they want true date and time types which H2 supports and
sqlite does not.  Also, H2 is almost as easy to install as SQLite -
the R drivers for both packages include the entire database and the
only extra thing you need for H2 is java which many people have
already anyways.Thus if SQLite had true date and time types I
think this would address a significant portion of those sqldf users
who find SQLite is not enough for their needs.

If they don't go to H2 then they will likely go to PostgreSQL to get
window/analytic functions and the main thing that would put them off
here is that its not as easy to install PostgreSQL as SQLite or H2.
Thus if SQLite were to support window/analytic functions it would
address the main other group of those who switch from SQLite when
using sqldf.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What if OpenDocument were built atop SQLite?

2014-09-10 Thread Gabor Grothendieck
On Wed, Sep 10, 2014 at 8:36 AM, John McKown
 wrote:
> Well, I did a git clone to fetch the LibreOffice source. It appears to
> be a mixture of Java, C, and C++. Just some stats:
>
> $find . -name '*.c' | wc
> 108 1083908
> ~/source-oem/libreoffice-core$find . -name '*.cpp' | wc
>  26  261360
> ~/source-oem/libreoffice-core$find . -name '*.java' | wc
>33293329  206921
> ~/source-oem/libreoffice-core$find . -name '*.cxx' | wc
>95629562  405662
>
> And, for "fun", I ran: for i in c cpp cxx java;do echo "$i";wc $(find
> . -name "*.${i}");done |& tee mckown
> to get the number of lines of code in each of those.
> c == 45,322 lines
> cpp == 4,600 lines
> cxx == 5,162,525
> java == 616,578 lines
>
> So C++ "wins" by about a 3:1 ratio of number of files and 10:1 ratio
> in terms of lines of code over Java.
>

There is also a breakdown of libre office code here with graphs and a table:
https://www.openhub.net/p/libreoffice/analyses/latest/languages_summary

The corresponding page for sqlite is:
https://www.openhub.net/p/sqlite/analyses/latest/languages_summary
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Window functions?

2014-08-28 Thread Gabor Grothendieck
The wording in the cited link is that

"Queries of the form: "SELECT max(x), y FROM table" returns the value
of y on the same row that contains the maximum x value."

There is some question of whether min(x) is "of the form" max(x).

On Thu, Aug 28, 2014 at 10:28 AM, Clemens Ladisch  wrote:
> Adam Devita wrote:
 select id, category_id, name, min(price) as minprice
from cat_pictures
 group by category_id;

>>
>> I'd be reluctant to write that query because it is non standard SQL and I
>> can't easily (5 minutes of searching) point at a document that tells me the
>> expected behavior.
>
> The SQL standard does not allow it.
>
> SQLite allows it for bug compatibility with MySQL.
> (The returned values are from some random row.)
>
> In SQLite 3.7.11 or later, the behaviour is defined:
> 
> but IIRC this was the wish of a paying customer, and is
> not documented anywhere else.
>
>
> Regards,
> Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] RPAD/LPAD

2014-03-06 Thread Gabor Grothendieck
On Thu, Mar 6, 2014 at 8:41 PM, RSmith <rsm...@rsweb.co.za> wrote:
>
> On 2014/03/07 01:59, Gabor Grothendieck wrote:
>>
>>
>>>
>>>>
>>>>> A small enhancement request:
>>>>>
>>>>> It would be great if the RPAD and LPAD functions could be implemented
>>>>> in
>>>>> sqlite.
>>>>>
>>>> The SQLite you can get the effect of RPAD(x,y) using PRINTF('%-*s',y,x).
>>>> See http://www.sqlite.org/lang_corefunc.html#printf for details.
>>>
>>> Thanks, but you snipped the relevant part of my post:
>>> "I know I can easily achieve the equivalent ... but if the functions were
>>> available natively it would avoid the need to hack third party SQL scripts."
>>>
>> I have also found that it was tedious to retarget MySQL scripts to
>> SQLite because many of the function calls are different.  Its not just
>> rpad and lpad but other functions too.
>
>
> Speaking as someone who retargets (nice word btw.) SQL scripts often, yes I
> agree, it's a bit of a chore to retarget SQL scripts to SQLite sometimes,
> but not really moreso than retargeting a script from MSSQL to PostGres or

I have also retargeted MySQL scripts to H2 and it was easier than to SQLite.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] RPAD/LPAD

2014-03-06 Thread Gabor Grothendieck
On Thu, Mar 6, 2014 at 6:29 PM, Walter Hurry  wrote:
> Richard Hipp wrote:
>
>> On Thu, Mar 6, 2014 at 3:41 PM, Walter Hurry  wrote:
>>
>>> A small enhancement request:
>>>
>>> It would be great if the RPAD and LPAD functions could be implemented in
>>> sqlite.
>>>
>>
>> The SQLite you can get the effect of RPAD(x,y) using PRINTF('%-*s',y,x).
>> See http://www.sqlite.org/lang_corefunc.html#printf for details.
>
> Thanks, but you snipped the relevant part of my post:
>
> "I know I can easily achieve the equivalent ... but if the functions were 
> available natively it would avoid the need to hack third party SQL scripts."
>

I have also found that it was tedious to retarget MySQL scripts to
SQLite because many of the function calls are different.  Its not just
rpad and lpad but other functions too.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion to add "locate" as a broader version of "instr"

2014-02-15 Thread Gabor Grothendieck
On Fri, Feb 14, 2014 at 2:33 PM, Max Vlasov  wrote:
> Hi,
>
> Some time ago when there was no "instr" functions, I looked at Mysql help
> pages and implemented a user function "locate" as the one that allows
> searching starting a particular position in the string. With two parameters
> form it was just identical to "instr" only the order of parameters was
> reversed. As I see, the latest sqlite has only "instr".
>
> It's not a big deal, but I noticed that "locate" with three parameters
> becomes convenient for CTE recursive queries since it allows search
> sequentially in the string. For example, a little bulky at last, but I
> managed to do "comma-list to dataset" query
>
> I suppose implementing "locate" and doing "instr" as a call to "locate"
> would cost the developers probably no more than a hundred of bytes for the
> final binary


Parsing fields is also done with substring_index in MySQL and having both locate
and substring_index would be useful for MySQL compatibility.

Parsing fields created using group_concat is one particular example.
One related
item is that in MySQL group_concat can specify the order of rows to be
concatenated
as well as a number of other aspects not currently available in SQLite.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Proposed enhancement to the sqlite3.exe command-line shell

2014-02-10 Thread Gabor Grothendieck
On Mon, Feb 10, 2014 at 1:56 PM, Petite Abeille
<petite.abei...@gmail.com> wrote:
>
> On Feb 10, 2014, at 7:39 PM, Gabor Grothendieck <ggrothendi...@gmail.com> 
> wrote:
>
>> That should have read right join.
>
> My personal opinion? Anyone even considering using a right outer join should 
> be cursed into repeating their first day at high school. For ever. Groundhog 
> Day, The High School Years.
>

Right joins are important since they generalize subscripting.  For
example, if X and Y are data tables (created using the R data.table
package) then X[Y] is a right join of X and Y.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Proposed enhancement to the sqlite3.exe command-line shell

2014-02-10 Thread Gabor Grothendieck
That should have read right join.  Its a nuisance when you are trying
to show someone SQL and trying to keep things simple that you have to
add the complexity of switching the arguments around.

I am still on 3.7.17 which is the version that currently ships with
the software I am using but its nice to know that I will have rfc4180
when its upgraded.

On Mon, Feb 10, 2014 at 1:26 PM, Petite Abeille
<petite.abei...@gmail.com> wrote:
>
> On Feb 10, 2014, at 5:19 PM, Gabor Grothendieck <ggrothendi...@gmail.com> 
> wrote:
>
>> The other features that would make teaching a bit easier would be to support
>> left join explicitly and support the rfc4180 standard for csv files.
>
> Hmmm?
>
> Left join:
> http://www.sqlite.org/syntaxdiagrams.html#join-operator
>
> RFC-4180 compliant .import:
> http://sqlite.org/releaselog/3_8_0.html
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Proposed enhancement to the sqlite3.exe command-line shell

2014-02-10 Thread Gabor Grothendieck
On Mon, Feb 10, 2014 at 10:23 AM, Richard Hipp  wrote:
> The Problem:
>
> Many new users (especially university students taking a "database 101"

The other features that would make teaching a bit easier would be to support
left join explicitly and support the rfc4180 standard for csv files.

> class) download the "sqlite3.exe" file from the SQLite website,
> double-click on the "sqlite3" icon to get a command-line shell, then start
> typing SQL statements.  But when they exit the shell, they are distressed
> to discover that their database has disappeared.
>
> Proposed Change To Address The Problem:
>
> When launching sqlite3.exe with a double-click, have it open a standard
> database in a standard place instead of an in-memory database as you would
> get when launching sqlite3.exe with no arguments.  Possibly also give
> additional hints, such as references to the ".open" command, when launching
> by double-click.
>
> (1) Detect double-click launch by looking at argc and argv.  On a
> double-click launch, argc==1 and argv[0] contains the full pathname of the
> executable.  On a command-line launch, argv[0] contains whatever the user
> typed, which is usually not the full pathname
>

I assume that means that if you do not keep sqlite3 on your path then you must
use:

   /path/to/sqlite3 :memory:

to call sqlite3 with an in-memory database. I am not so enthusiastic about this.

How about as an alternative that it works as it does now but when you
exit it asks you
if you want to save the database.  That seems more consistent with how
other programs
(editors, word processors, spreadsheets, etc.) work.

> (2) This change would be for Windows only.  The code to implement it would
> be enclosed in #ifdef _WIN32 ... #endif
>
> (3) Announce the name of the "standard" database file in the banner.
>
> Questions:
>
> (4) What should the name of the "standard" database file be?
>
> (5) In what folder should the "standard" database file be created?
>

%appdata%\sqilte
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Gabor Grothendieck
PostgreSQL supports

   create index  on  (  )

Note that it allows an expression and not just a column name.

See:

  http://www.postgresql.org/docs/9.1/static/sql-createindex.html

Perhaps indexing the expression in question would be an alternative
that would keep the performance info separate from the select.




On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp  wrote:
> There is a survey question at the bottom of this message.  But first some
> context...
>
> Over on the sqlite-dev mailing list, a debate has been going on about the
> best way to provide some useful hints to the query planner.  The query
> under discussion looks like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> Assuming that the schema has appropriate indices and ANALYZE has been run,
> SQLite does a good job of selecting an efficient query plan for the above.
> But the query planner lacks a key piece of information that could help it
> to do a better job.  In particular, the query planner does not know how
> often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
> out, the best query plan depends critically on this one fact.
>
> By default, the query planner (in SQLite 3.8.0) assumes that a
> subexpression that cannot use an index will always be true.  Probably this
> will be tweaked in 3.8.1 so that such subexpressions will be assumed to
> usually, but not always, be true.  Either way, it would be useful to be
> able to convey to the query planner the other extreme - that a
> subexpression is usually not true.
>
> (Pedantic detail:  "not true" is not the same as "false" in SQL because
> NULL is neither true nor false.)
>
> There is currently code in a branch that provides a hinting mechanism using
> a magic "unlikely()" function.  Subexpressions contained within
> "unlikely()" are assumed to usually not be true.  Other than this hint to
> the query planner, the unlikely() function is a complete no-op and
> optimized out of the VDBE code so that it does not consume any CPU cycles.
> The only purpose of the unlikely() function is to let the query planner
> know that the subexpression contained in its argument is not commonly
> true.  So, if an application developer knows that the string "bach" seldom
> occurs in composer names, then she might rewrite the query like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> The query planner might use this "likelihood" hint to choose a different
> query plan that works better when the subexpression is commonly false.  Or
> it might decide that the original query plan was good enough and ignore the
> hint.  The query planner gets to make that decision.  The application
> developer is not telling the query planner what to do. The application
> developer has merely provided a small amount of meta-information about the
> likelihood of the subexpression being true, meta-information which the
> query planner may or may not use.
>
> Note that the subexpression does not have to be a LIKE operator.
> PostgreSQL, to name one example, estimates how often a LIKE operator will
> be true based on the pattern on its right-hand side, and adjust query plans
> accordingly, and some have argued for this sort of thing in SQLite.  But I
> want a more general solution.  Suppose the subexpression involves one or
> more calls to application-defined functions about which the query planner
> cannot possible know anything.  A general mechanism for letting the query
> planner know that subexpressions are commonly not true is what is desired -
> not a technique for making LIKE operators more efficient.
>
> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Re: [sqlite] MONTH function

2013-06-23 Thread Gabor Grothendieck
Or even:

"select cast(strftime('%m') as integer)=6"

On Sun, Jun 23, 2013 at 4:16 PM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:
> Which can also be written as:
>
> "select cast(strftime('%m','now') as integer)=6"
>
> On Sun, Jun 23, 2013 at 4:11 PM, Patrik Nilsson
> <nipatriknils...@gmail.com> wrote:
>> You can write:
>>
>> "select cast(strftime('%m',datetime('now')) as integer)=6"
>> 1
>>
>>
>> On 06/23/2013 09:45 PM, Lucas wrote:
>>> Hello,
>>>
>>> I am testing SQLIte as we are considering to change our DB Server but I
>>> found that a feature is missing, the function Month:
>>>
>>> SELECT SUM( NINGRESO ) AS ndev FROM APUNTES WHERE MONTH(FECHA) = 1
>>>
>>> This works perfect under MySQL or MSSQL.
>>>
>>> Do you plan to introduce MONTH and YEAR functions?.
>>>
>>>
>>> Also, the function strftime('%m', fecha) does not solve the issue:
>>>
>>> SELECT SUM( NINGRESO ) AS ndev FROM APUNTES WHERE strftime('%m', fecha) = 1
>>>
>>>
>>> Please, any idea of how to solve it.
>>>
>>>
>>> Thank you. Best regards,
>>>
>>> Lucas de Beltran
>>> Caritas España
>>>
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>
>> --
>> ASCII ribbon campaign ( )
>>  against HTML e-mail   X
>>  www.asciiribbon.org  / \
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] MONTH function

2013-06-23 Thread Gabor Grothendieck
Which can also be written as:

"select cast(strftime('%m','now') as integer)=6"

On Sun, Jun 23, 2013 at 4:11 PM, Patrik Nilsson
 wrote:
> You can write:
>
> "select cast(strftime('%m',datetime('now')) as integer)=6"
> 1
>
>
> On 06/23/2013 09:45 PM, Lucas wrote:
>> Hello,
>>
>> I am testing SQLIte as we are considering to change our DB Server but I
>> found that a feature is missing, the function Month:
>>
>> SELECT SUM( NINGRESO ) AS ndev FROM APUNTES WHERE MONTH(FECHA) = 1
>>
>> This works perfect under MySQL or MSSQL.
>>
>> Do you plan to introduce MONTH and YEAR functions?.
>>
>>
>> Also, the function strftime('%m', fecha) does not solve the issue:
>>
>> SELECT SUM( NINGRESO ) AS ndev FROM APUNTES WHERE strftime('%m', fecha) = 1
>>
>>
>> Please, any idea of how to solve it.
>>
>>
>> Thank you. Best regards,
>>
>> Lucas de Beltran
>> Caritas España
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> --
> ASCII ribbon campaign ( )
>  against HTML e-mail   X
>  www.asciiribbon.org  / \
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Running on windows 98

2013-01-24 Thread Gabor Grothendieck
On Thu, Jan 24, 2013 at 2:21 PM, Richard Hipp  wrote:
> On Thu, Jan 24, 2013 at 2:01 PM, Jose F. Gimenez wrote:
>
>> Richard,
>>
>> thanks for replying.
>>
>>
>>  We have no way of testing SQLite on Win9x and so we do not intend to
>>> support Win9x moving forward.  Some older versions of SQLite are known to
>>> work on Win9x.  If you are still supporting Win9x applications, I suggest
>>> you use those older versions of SQLite.
>>>

How about just supporting a compile time option to turn on or off that
optimization (on by default)?  Then those compiling for Win9x could
just turn it off yet it would not require explicit support and testing
of Win9x since its the option being supported rather than the platform
support.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] just a test

2012-12-08 Thread Gabor Grothendieck
I am still having problems with Igor's gmail messages being marked as
spam in gmail but after the upteenth time declaring them not to be
spam google finally asked me if I wanted to report it to their gmail
team so hopefully they will fix it soon.

On Mon, Dec 3, 2012 at 11:59 PM, Clive Hayward  wrote:
> Igor's messages sometimes get marked as spam by gmail.
>
> --
> Clive Hayward
>
>
> On 2012-12-03, at 7:57 AM, e-mail mgbg25171  
> wrote:
>
>> I've posted a couple of mails lately...I'm not getting them via the list or
>> any responses.
>> Admin says Igor responded to one of them...Thanks Igor!
>> This is just a test to see if the mail is coming to me (as a member of the
>> list).
>> Therefore please just ignore this.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQL:2003 -- Window Functions

2012-09-20 Thread Gabor Grothendieck
On Wed, Sep 19, 2012 at 12:51 PM, joe.fis...@tanguaylab.com
 wrote:
> Too bad SQLite doesn't yet support SQL Window Functions.
>
> Are there any SQLite Extension Libraries that support "SQL:2003 type Window
> Functions"?
> I specifically need LEAD and LAG to calculate an event integer timestamp
> delta between consecutive rows.
> I've played with some self-join code but that's proving to be complicated.
>

SQL Window Functions is the number one feature that I could use as
well.  In R, sqlite can be used for manipulating R data frames via the
sqldf package and this sort of functionality would be very useful.
(sqldf also handles PostgreSQL which does have windowing functions but
PostgreSQL requires more set up than sqlite so its not as accessible
to users.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-06 Thread Gabor Grothendieck
On Sun, May 6, 2012 at 8:00 PM, Donald Griggs  wrote:
> Regarding:   What precisely are the
> "improvements" in handling of CSV inputs?
>
>
> Gabor, I don't know about "precisely" -- I'll let others on the list tell
> me where I'm off, but here's my take:
>
>
> A lot of strange things call themselves csv, but the change attempts to
> make the sqlite3 utility's CSV inputs perform a bit more closely to RFC4180.
>     http://tools.ietf.org/html/rfc4180
>
> http://en.wikipedia.org/wiki/Comma-separated_values#Toward_standardization
>
> In particular, during CSV mode import:
>  -- Allow any field to be surrounded by double quote characters without
> those characters being considered part of the field data.
>  -- Allow fields to contain embedded commas (or other separators) when the
> field is surrounded by double quote characters.
>  -- Allow fields to span multiple lines if they are surrounded by double
> quote characters.
>  -- Allow the double quote character to be escaped by having two adjacent
> double quote characters. (But note that a field consisting solely of two
> double quote characters still represents an empty string field.)
>
>  -- On output in CSV mode, surround text fields with double quotes when
> needed.
>
>
> See check-in [93aa17d866]   http://www.sqlite.org/src/info/93aa17d866
>

Thanks very much.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
On Fri, May 4, 2012 at 10:46 AM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:
> On Fri, May 4, 2012 at 10:39 AM, Richard Hipp <d...@sqlite.org> wrote:
>> On Fri, May 4, 2012 at 10:33 AM, Gabor Grothendieck <ggrothendi...@gmail.com
>>> wrote:
>>
>>> On Fri, May 4, 2012 at 10:20 AM, Richard Hipp <d...@sqlite.org> wrote:
>>> > On Fri, May 4, 2012 at 10:06 AM, Rob Richardson <
>>> rdrichard...@rad-con.com>wrote:
>>> >
>>> >> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
>>> >>         Queries of the form: "SELECT max(x), y FROM table" returns the
>>> >> value of y on the same row that contains the maximum x value.
>>> >>
>>> >> Is that standard SQL behavior?  I'd have expected that to return one row
>>> >> for every row in the table.  To get the behavior described above, I'd
>>> use
>>> >> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>>> >>
>>> >
>>> > It is definitely NOT standard behavior.  The standard behavior is
>>> > undefined.  Or (with many SQL engines) it will throw an error if you
>>> have a
>>> > term in the result set that is not part of an aggregate function or an
>>> > element of the GROUP BY clause.  But lots of newbies expect SQL to work
>>> as
>>> > described in the 3.7.11 release comments, and we used to get support
>>> > questions because it did not.  And so rather than continue to answer the
>>> > questions over and over, I figured it would be easier to tweak SQLite to
>>> > reliably do what newbies expect.  I never anticipated that this change
>>> > would be so controversial or confusing.
>>> >
>>> > All the existing, portable, documented ways to find the maximum element
>>> of
>>> > one column while simultaneously finding the other elements in the same
>>> row,
>>> > continue to work as they always have.  You are not required to use this
>>> new
>>> > approach.  In fact, if you want your SQL to be portable, you should
>>> > probably avoid it.  By adding this feature, we had hoped to help
>>> > application developers avoid a common SQL programming error.  That's all.
>>> > There is nothing profound going on here.
>>>
>>> Can't anyone answer the question directly?  I would still like to know
>>> precisely what works and what does not.  Its not possible to rely on
>>> general SQL documentation for this so I think its important to
>>> document it exactly.  Otherwise, we are left to examine the source
>>> code or use trial and error (and these methods only tell you how it
>>> works but not how its intended to work and they could be different if
>>> there are bugs).  If the documentation does exist please point me to
>>> it but I did not find it on my own.
>>>
>>
>> If a single min() or max() aggregate function appears in a query, then any
>> other columns that are not contained within aggregate functions and that
>> are not elements of the GROUP BY will take values from one of the same rows
>> that satisfied the one min() or max() aggregate function.
>
> Thanks!  I expect that this will be useful for me for at least quick
> and dirty computations.
>
> I suggest that this statement be added to the docs if its not already there.

Also the other part of my question.  What were the improvements to the
csv import?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
On Fri, May 4, 2012 at 10:20 AM, Richard Hipp <d...@sqlite.org> wrote:
> On Fri, May 4, 2012 at 10:06 AM, Rob Richardson 
> <rdrichard...@rad-con.com>wrote:
>
>> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
>>         Queries of the form: "SELECT max(x), y FROM table" returns the
>> value of y on the same row that contains the maximum x value.
>>
>> Is that standard SQL behavior?  I'd have expected that to return one row
>> for every row in the table.  To get the behavior described above, I'd use
>> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>>
>
> It is definitely NOT standard behavior.  The standard behavior is
> undefined.  Or (with many SQL engines) it will throw an error if you have a
> term in the result set that is not part of an aggregate function or an
> element of the GROUP BY clause.  But lots of newbies expect SQL to work as
> described in the 3.7.11 release comments, and we used to get support
> questions because it did not.  And so rather than continue to answer the
> questions over and over, I figured it would be easier to tweak SQLite to
> reliably do what newbies expect.  I never anticipated that this change
> would be so controversial or confusing.
>
> All the existing, portable, documented ways to find the maximum element of
> one column while simultaneously finding the other elements in the same row,
> continue to work as they always have.  You are not required to use this new
> approach.  In fact, if you want your SQL to be portable, you should
> probably avoid it.  By adding this feature, we had hoped to help
> application developers avoid a common SQL programming error.  That's all.
> There is nothing profound going on here.

Can't anyone answer the question directly?  I would still like to know
precisely what works and what does not.  Its not possible to rely on
general SQL documentation for this so I think its important to
document it exactly.  Otherwise, we are left to examine the source
code or use trial and error (and these methods only tell you how it
works but not how its intended to work and they could be different if
there are bugs).  If the documentation does exist please point me to
it but I did not find it on my own.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
In this link:

  http://sqlite.org/releaselog/3_7_11.html

it refers to these new features:

Queries of the form: "SELECT max(x), y FROM table" returns the value
of y on the same row that contains the maximum x value.

Improvements to the handling of CSV inputs in the command-line shell

Is there documentation somewhere that defines exactly what these mean?
 Does the max(x) apply to min(x) too?  does the max have to precede
the y?  Can there be multiple y's?  What precisely are the
"improvements" in handling of CSV inputs?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Good books to lear SQL using SQLite?

2012-04-30 Thread Gabor Grothendieck
On Mon, Apr 30, 2012 at 10:18 AM, Arbol One  wrote:
> I am learning SQL using SQLite, can anyone tell me of a good book to learn 
> SQL using SQLite?
>

Its a web page and free software, not a book, but you might try this
page to start off and then get a book after you have used it a bit:

http://sqldf.googlecode.com

It uses R but you don't really have to know R to use it.  Note link
which is for people who want to try it and don't know R:
http://code.google.com/p/sqldf/#For_Those_New_to_R

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is it possible to use substrings of Windows DOS batch fiile parameters in sqlite3.exe -line db ?

2012-04-11 Thread Gabor Grothendieck
On Tue, Apr 10, 2012 at 7:14 PM, Frank Chang  wrote:
>
> Good evening, We are trying to generate automated SQLITE  SQL scripts based 
> on the names of SQLite tables derived by substring manipulation of Windows 
> DOS batch file and/or Windows environment variables. For example:
>
> /*   mary.bat */
> FOR /f %%a IN ('dir /b *.zip') DO CALL sub %%a
>
>
> /* sub.bat */
> set str=%1
> set camster=%str:~0.17%

The dot should be a comma.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how to wrie "" char in my html file?

2012-03-31 Thread Gabor Grothendieck
On Sat, Mar 31, 2012 at 2:55 AM, YAN HONG YE  wrote:
>>sqlite3 -html C:\mydatabase\mydzh.db "select ''">mm.html
> this command result is not  in the mm.html file, it's this following 
> text:
> table
> 
> not I wanted, how to wrie  char in my html file?

This seems to be the code I sent you privately except its been changed
from what I wrote so that it no longer works.  The original code did
not use -html .

Also for those who are more used to UNIX since we are on Windows here,
note that
 echo ""
and
 echo ''
do NOT work (the first outputs double quotes and the second does not
escape the angle brackets) but as an alternative this does work:
 echo ^
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Gabor Grothendieck
On Tue, Mar 27, 2012 at 3:02 PM, Simon  wrote:
> select closing_price, moving_average( funky_oscillator( closing_price ) )...

There is a moving average calculation in SQLite here but given the
complexity you might prefer to do the analytical portion in your
program:
http://code.google.com/p/sqldf/#Example_16._Moving_Average

It would be nice if sqlite had sql windowing functions to simplify
these sorts of calculations.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] HOW TO EXPORT TABLE HEAD

2012-03-27 Thread Gabor Grothendieck
On Mon, Mar 26, 2012 at 11:38 PM, YAN HONG YE  wrote:
> WHEN I export sqlite database to a html file or txt file, I couldn't know how 
> to include the database table head.
> who can tell me?
> Thank you!

Use -header like this:
  sqlite3 -html -header my.db "select * from mytable"

See output of:
  sqlite3 -help
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] more than 2000 columns

2011-11-14 Thread Gabor Grothendieck
On Mon, Nov 14, 2011 at 12:50 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 14 Nov 2011, at 5:11pm, Gabor Grothendieck wrote:
>
>> The requirement for a large number of columns is actually one thing
>> that is often needed when using sqlite from R.  Typically the use case
>> is that a user wishes to read a portion of an external file into R and
>> that file has thousands of columns.  For example, each row might be an
>> individual and each column is a gene.
>
> That would be a text file, right ?  So SQLite isn't involved in that.  You 
> have other routines to read text files.
>

Yes. Its text.  Also this is not application specific.  Its a general
facility that everyone uses so writing application specific routines
is completely out of the question here.

>> Or each row is a time point and
>> each column is a security (stock, bond, etc.)  The file may too large
>> to reasonably handle in memory so rather than deal with it in chunks
>> at a time its easier to just read it into sqlite in its entirety and
>> then pick off the portion you want into R using sql.
>
> So I actually have R on my Mac and I went and had a look.  You're talking 
> about using RSQLite ?  Yes, there are fast ways to move data between a SQLite 
> table and an R matrix.  I can see the appeal.  But you can also execute 
> arbitrary SQL commands.  So you can write your own import/export routine 
> which takes a very wide matrix from R but stores it in a less wide table in 
> SQLite.
>

What users want is to get access to their data with as little hassle
as possible and currently its possible to do it all in one line of R
code which sets up an sqlite database and table, reads the data into
it and then applies a given sql statement to that and finally destroys
the database.  Its trivial to do.  The only limitation is that the
file can have no more than 999 columns as its currently set up.  In
most cases that works but some people have wider files and there are
constantly requests to increase the limit.  All you do is provide the
filename, certain parameters such as the input field separator and
optionally the sql statement (which defaults to select * from file).
Its very easy from the user's point of view. Its just one line of
code.

> What I think you're trying to do is use a SQLite table as a data frame.  If I 
> understand correctly, this means you can use the commands you'd normally use 
> with an R matrix, but with data still stored in a SQLite table, without 
> having to rewrite the code of your program.  It's a really neat hack.
>
> Trouble is, SQLite isn't efficient with such wide tables.  Your code is going 
> to operate really slowly.  I'm going to have to get backup from SQLite 
> experts here, but I understand that unlike how matrices are stored in R, a 
> table's columns are stored in a structure like a linked list.  So if you try 
> to access the 1,400th column it has to walk a list of 1,399 items to find it. 
>  Which isn't efficient.
>

I am not aware of any performance tests on very wide files with sqlite
followed by moving them into R but with the usual files of just a
handful of columns it is sufficiently fast -- its so fast that at
times it can be faster to read it into sqlite and then from there into
R than reading the file straight into R (in those cases where both are
possible). We will see what happens when it gets expanded beyond 999.
A previous thread on this list suggested that there was no real
downside to expanding the limit.  I asked for clarification at the
time but no one responded.

> So fine.  Use SQLite to store tables as wide as you like.  But write your own 
> import/export commands to fetch appropriate parts into memory.  There's no 
> need to use a SQLite table 2000 columns wide just because your matrix is 2000 
> columns wide.  Purely a recommendation for serious software intended for 
> proper use.  Do anything you like in quick hacks: CPU time and memory usage 
> can be stupid big for those.
>

I am not clear on what you are suggesting but the way it works is that
the file gets read in its entirety into an sqlite database and then an
sql statement specified by the user is applied to that and only the
output of the sql statement ever gets sent to R so even if the input
has thousands of columns, the data sent from sqlite to R might not.

Hope that clarifies the situation.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] more than 2000 columns

2011-11-14 Thread Gabor Grothendieck
On Mon, Nov 14, 2011 at 12:21 PM, Igor Tandetnik <itandet...@mvps.org> wrote:
> On 11/14/2011 12:11 PM, Gabor Grothendieck wrote:
>>
>> The requirement for a large number of columns is actually one thing
>> that is often needed when using sqlite from R.  Typically the use case
>> is that a user wishes to read a portion of an external file into R and
>> that file has thousands of columns.  For example, each row might be an
>> individual and each column is a gene.  Or each row is a time point and
>> each column is a security (stock, bond, etc.)
>
> In relational databases, things like that are usually represented as
> GeneInfo(person, gene, infoAboutGene) or StockInfo(timestamp, stock, price)

That is a good point; however, in the context of this use case we are
dealing with external files and don't have control over their format.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] more than 2000 columns

2011-11-14 Thread Gabor Grothendieck
On Mon, Nov 14, 2011 at 5:32 AM, Simon Slavin  wrote:
>
> On 14 Nov 2011, at 7:38am, vinayh4 wrote:
>
>> I need to create table with more than 2000 columns, How to reset
>> SQLITE_MAX_COLUMN value which
>> is  2000 . Plz help me on this issue.
>
> You almost never need to have more columns than you can fit in your head at 
> one time.  The way you handle 2000 columns is to make a database for them !

The requirement for a large number of columns is actually one thing
that is often needed when using sqlite from R.  Typically the use case
is that a user wishes to read a portion of an external file into R and
that file has thousands of columns.  For example, each row might be an
individual and each column is a gene.  Or each row is a time point and
each column is a security (stock, bond, etc.)  The file may too large
to reasonably handle in memory so rather than deal with it in chunks
at a time its easier to just read it into sqlite in its entirety and
then pick off the portion you want into R using sql.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 7:11 PM, Richard Hipp <d...@sqlite.org> wrote:
> On Tue, Nov 8, 2011 at 7:08 PM, Gabor Grothendieck
> <ggrothendi...@gmail.com>wrote:
>
>> On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp <d...@sqlite.org> wrote:
>> > On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
>> > <ggrothendi...@gmail.com>wrote:
>> >
>> >> In R, the RSQLite driver for SQLite currently has
>> >> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> >> for many different projects and on different platforms and it seems
>> >> that a number of these projects want a larger number.  Users don't
>> >> compile this themselves so they are stuck with whatever number is
>> >> compiled in for them.
>> >>
>> >> What are the considerations for setting this number?
>> >>
>> >> Is there any general advice that can be given on how to set this number?
>> >>
>> >
>> > I just checked and it appears that MacOS Lion compiles it a 50.
>> >
>> > Whenever you use a value like ?N, SQLite allocates an array of N objects,
>> > each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
>> > 36MB memory allocation (with its accompanying memset()).  That's really
>> the
>> > only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.
>>
>> If "SELECT ?50" allocates 50 * 72 bytes of memory then how
>> does that relate to SQLITE_MAX_VARIABLE_NUMBER?
>> SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
>> all.
>>
>
> SQLITE_MAX_VARIABLE_NUMBER determines the largest N for which ?N will
> work.  So by default, the maximum allocation is 999*72.  You can increase
> this to whatever you are comfortable with.
>

Thanks.  So there is really no downside to making it 10,000, say?
Those who want it that large will be able to have that many columns
and those who don't need that many won't incur any penalties.  Is that
right?


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 6:46 PM, Richard Hipp <d...@sqlite.org> wrote:
> On Tue, Nov 8, 2011 at 5:50 PM, Gabor Grothendieck
> <ggrothendi...@gmail.com>wrote:
>
>> In R, the RSQLite driver for SQLite currently has
>> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> for many different projects and on different platforms and it seems
>> that a number of these projects want a larger number.  Users don't
>> compile this themselves so they are stuck with whatever number is
>> compiled in for them.
>>
>> What are the considerations for setting this number?
>>
>> Is there any general advice that can be given on how to set this number?
>>
>
> I just checked and it appears that MacOS Lion compiles it a 50.
>
> Whenever you use a value like ?N, SQLite allocates an array of N objects,
> each of 72 bytes in size.  So doing "SELECT ?50" on Lion requires a
> 36MB memory allocation (with its accompanying memset()).  That's really the
> only downside to choosing a really large SQLITE_MAX_VARIABLE_NUMBER.

If "SELECT ?50" allocates 50 * 72 bytes of memory then how
does that relate to SQLITE_MAX_VARIABLE_NUMBER?
SQLITE_MAX_VARIABLE_NUMBER did not seem to enter the calculation at
all.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
On Tue, Nov 8, 2011 at 5:55 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 8 Nov 2011, at 10:50pm, Gabor Grothendieck wrote:
>
>> In R, the RSQLite driver for SQLite currently has
>> SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
>> for many different projects and on different platforms and it seems
>> that a number of these projects want a larger number.
>
> What sort of statements are R users doing which might require binding a 
> thousand variables to one statement ?  I can't think of any situation like 
> this that doesn't indicate an insane schema which should be normalised.
>

Anyone who uses R could be using this so we don't really know.  We do
know that there seems to be multiple requests for increasing the
limit.

R is used for analyzing data and when when one is doing that one does
not always generate the data oneself but receives it from an external
source.  This may include files which may be too large to read into R
or might fit in but are too slow to read into R.  They might be read
into a database and then a portion read into R from the database.

One could imagine the rows might represent individuals and the columns
might represent a large number of genes.  Or perhaps each row is an
individual and each column is a health marker.  Or each row is a time
point and each column is a security.

Typically such users must use a different database but would have
preferred to use SQLite hence the question of what are the
considerations of coming up with a single SQLITE_MAX_VARIABLE_NUMBER
that everyone is stuck with.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLITE_MAX_VARIABLE_NUMBER

2011-11-08 Thread Gabor Grothendieck
In R, the RSQLite driver for SQLite currently has
SQLITE_MAX_VARIABLE_NUMBER set to 999.  This is used by many people
for many different projects and on different platforms and it seems
that a number of these projects want a larger number.  Users don't
compile this themselves so they are stuck with whatever number is
compiled in for them.

What are the considerations for setting this number?

Is there any general advice that can be given on how to set this number?

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] feed "endless" data into sqlite, thru a shell script

2011-09-27 Thread Gabor Grothendieck
On Tue, Sep 27, 2011 at 2:14 PM, David Garfield
 wrote:
> Any entry in a pipe could be buffering.  In a quick test here, awk is
> buffering.  To find the buffering, try using the pieces up to a given
> stage with " | cat " added at the end.  If this buffers, you've found
> the problem.  Unbuffered output is usually slower, so it is normally
> done only to a terminal.  I think the only easy way to externally
> disable the buffer is to wrap the program in a pseudo-tty.
> Alternatively, look for an option that lets you explicitly unbuffer.
> (for instance, in perl, do: $| = 1; )
>

gawk has fflush()
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Handling of IEEE-754 nan and +/-inf in SQLite?

2011-06-06 Thread Gabor Grothendieck
On Mon, Jun 6, 2011 at 11:54 AM, Richard Hipp  wrote:
> On Mon, Jun 6, 2011 at 11:44 AM, Sidney Cadot  wrote:
>
>>
>> Would it be useful to open a ticket on this issue, or will it never be
>> changed e.g. for fear of breaking backward compatibility?
>>
>
> There are approx 2 billion legacy apps in the wild that use SQLite.  Not
> breaking things is very important to us, therefore.  So there needs to be a
> compelling reason to make a change like this.  I do not think you have made
> a sufficient case for the change, yet.

I and others use SQLite from R and R itself gives Inf, -Inf and NaN
for 1/0, -1/0 and 0/0 respectively so it would reduce the differences
between the database and R if it worked in the same way.  Perhaps an
option could control this behavior so that backwards compatibility
could be maintained.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] pragma foreign_key_list deprecated in 3.7.4?

2010-12-08 Thread Gabor Grothendieck
On Wed, Dec 8, 2010 at 2:44 PM, Richard Hipp  wrote:
> On Wed, Dec 8, 2010 at 2:35 PM, Petite Abeille 
> wrote:
>
>> Hello,
>>
>> The pragma foreign_key_list appears to be deprecated in 3.7.4:
>>
>> http://www.sqlite.org/pragma.html#pragma_foreign_key_list
>>
>> Any reason for such deprecation?
>>
>
> Now that foreign key constraints are enforced natively, why would you want
> to have a list of them?  Why should the foreign_key_list pragma continue to
> consume code space and developer maintenance time?
>

Surely if the objective is to be small yet useful its important to
give users the capability to easily implement what would otherwise
have to be done in the database itself.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow SELECT Statements in Large Database file

2010-10-29 Thread Gabor Grothendieck
On Fri, Oct 29, 2010 at 12:07 PM, Jonathan Haws
 wrote:
> All,
>
> I am having some problems with a new database that I am trying to setup.
>
> This database is a large file (about 8.7 GB without indexing).  The problem
> I am having is that SELECT statements are extremely slow.  The goal is to
> get the database file up and running for an embedded application (we have
> tons of storage space so the size is not a problem).
>
> Here is the schema layout:
>
> CREATE table Dted (dtedkey INTEGER PRIMARY KEY, dted_lat
> INTEGER, dted_lon INTEGER, dted_alt FLOAT);
>
> We lookup dted_alt based on dted_lat and dted_lon.  Here is our SELECT
> statement:
>
> SELECT dted_alt FROM Dted WHERE (dted_lat=%d AND dted_lon=%d);
>
> The numbers fed to dted_lat and dted_lon are typically on the order of
> 37 and -111.
>
> What can we do to speed up our SELECT statements?  Minutes is
> unacceptable for our application.  We were hoping we could run somewhere
> on the order of 500 queries per second and get valid results back.
>
> I am not an SQL expert, but I was reading about indexes that that it is
> best to have a specific index per SELECT.  Since we only have one,
> this is the index I am creating now (it has been creating this index on my
> machine for the past 10 minutes now):
>
> CREATE INDEX idx_ll ON Dted(dted_lat,dted_lon);
>
> Is that a good index for my SELECT?  Will it speed up the accesses?
>
> Any thoughts?
>
>

Google for the spatialite extension.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] GUI for data entry

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 12:54 PM, Graham Smith  wrote:
> Tom,
>
> Thanks for this.
>
> My main reason for asking is because I am trying to encourage my
> students and indeed clients to think "database" rather than
> "spreadsheet". Most of the time these aren't big or complex data sets
> (normally records in the hundreds, sometimes the thousands) but still
> big enough to create major problems for themselves and me, just
> because the spreadsheet gives them the freedom to really screw things
> up.
>
> While far from perfect, I could live with a single table in a database
> that could be queried from R .  But it needs to be user friendly and
> run on Linux, Windows and Macs.
>

If the purpose of this is teaching with R then the R package sqldf
lets you query all R data frames in your session using sql as if they
were one big giant database.  If you stick with the few dozen data
frames that ship with R or ones you create yourself using various R
facilities then you don't have to enter anything in the first place.
See the sqldf home page at: http://sqldf.googlecode.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Involving in sqlite development

2010-10-08 Thread Gabor Grothendieck
On Thu, Oct 7, 2010 at 5:26 PM, sjtirtha  wrote:
> Hi,
>
> I'm interested involving in sqlite development.
> How can I start it?
>

If I can add to this question is there a posted wishlist,
todo list or roadmap?

Something like this:
http://www.h2database.com/html/roadmap.html

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Which data type is better for date?

2010-09-04 Thread Gabor Grothendieck
On Sat, Sep 4, 2010 at 7:31 AM, Mike Zang  wrote:
> I try to convert data to SQLite3 for iPad, please give me some detail
> suggestion.
>
> I think that I can save date value as below to SQLite3, I want to know
> which is better, or anything else if you have good idea.
>
> 1. integer as seconds since 1970
> 2. integer as days since 1970
> 3. string as '2010-09-03'
> 4. string as '10-09-03'

sqlite has julianday and date sql functions which convert back and
forth between julianday (number of days since noon in Greenwich on
November 24, 4714 B.C.) and -mm-dd representations and also handle
other manipulations in those formats so you probably want to choose
one of those.  -mm-dd does have the advantage that its easier to
look at the raw data in the database.

Also, if you are only dealing with dates and do not need to consider
time zones then its best to use a representation that uses neither
times nor time zones since those can introduce errors which are
artifacts of the representation.  time zone errors (confusion between
UTC and current time zone) can be particularly subtle.

sqlite> select date("now");
2010-09-04
sqlite> select date("2000-01-01", "+1 day");
2000-01-02
sqlite> select julianday(date("now")) - julianday("2010-09-01");
3.0
sqlite> select date(julianday(date("now")));
2010-09-04

See:
http://www.sqlite.org/cvstrac/wiki?p=DateAndTimeFunctions
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 5000 tables with 3000 records vs 1 table with 15000000 records

2010-08-22 Thread Gabor Grothendieck
On Sun, Aug 22, 2010 at 4:51 AM, Mike Zang  wrote:
> I have 5000 files and I want to converrt them to SQLite3 on iPad, now I
> have a performance question, I am not sure which way is better for select
> and insert data in SQLite3.
>
> I have  two ideas for converting.
>
> 1. convert 1 file to 1 table, so that I will have about 5000 tables in
> SQLIte3 database, and any file will have about 3000 records.
>
> 2. convert all 5000 files to 1 table, there will be 1500 records.
>
> Please give a suggestion before I start my programming.

Don't know how generalizable this is but I tried this with a single
10+ GB table made up of ~ 1000 similarly formatted tables of ~ 75k
records each.  This is running on a USB drive under Windows Vista and
SQLite version 3.6.18.  I tried adding two indexes.  The first "
create index " took a few minutes to an hour to create and I killed
the second " create index " after it was still going 24 hours later.
" select count(*) " takes 5 minutes to run but " select count(*) from
data where x = 'X' " returns immediately where x is the first column
in the index that succeeded.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Doing fine with SQLite

2010-05-04 Thread Gabor Grothendieck
You don't need to dump the data to a csv file and then read it into R
and there is no need to use the sqlite3 console at all as R's RSQLite
package can directly read and write SQLite databases.  Also see the
sqldf package.

On Tue, May 4, 2010 at 1:02 PM, Matt Young  wrote:
> I can work SQLite from by R stat package, but I am having hard time
> mixing special sqlite command intermixed with SQL statements when I
> send a text sequence to swqlite (even from the dos consol)
>
>
> sqlite3 test.db ".mode csv   select * from selected limit 4"
>
> Makes sqlite choke because I do not know what the inline terminator is
> for a  text invocation argument, and can't find it in the docs.
>
> That is my last minor detail, and I have indices, joins, and unions
> going on the Bureau of Labor Statistic under R.  Will make SQLite
> quite popular among the economists.  Getting access to reams of data
> from economic we sites, directly into R dataframes via a set of common
> key words familiar to economists.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [server] HTTP + SQLite bundled as single EXE?

2010-05-02 Thread Gabor Grothendieck
SQLite and R are already integrated through the RSQLite/DBI packages
and even higher level facilities (which I have posted about on this
thread) also exist.  I think that such an integration would logically
be done by R people rather than sqlite people.

In the discussion on this thread integration may mean different things
to different people so if you are referring to specific features
beyond what exists (such as a SQL functions written in R or stored
procedures written in R) then you can (1) speak with the RSQLite
package maintainer to see if he is interested in adding them or (2) if
you are interested in contributing then also discuss it with him, or
(3) if you are referring to specific features that make more sense in
certain other R packages (sqldf, RODBC, RJDBC, sqliteDF) then you can
communicate to the maintainers of those packages.

On Sun, May 2, 2010 at 8:38 PM, Matt Young  wrote:
> I want to see SQLite integrated into the R statistical package. R Project
> http://www.gardenersown.co.uk/Education/Lectures/R/regression.htm#multiple_regression
>
> for example.
>
> R statistical is very and becoming more popular, has great plotting,
> and wrestles data in frames that look awfully like sql tables.  R has
> built in procedure function, can cast text around fairly powerfully
> and so on.  It is begging for the sqlite engine.
>
> Like me, getting better access to government statistical table.
>
>
> On 5/2/10, Gilles Ganault  wrote:
>> Hello,
>>
>> It's probably quite an easy thing to do for someone well versed in C,
>> but I haven't seen a project that would combine a web server and
>> SQLite into a single EXE.
>>
>> Besides ease of deployment, this would offer a cross-platform solution
>> that wouldn't require developing a specific client-side connector,
>> since both hosts would speak HTTP.
>>
>> Would someone with enough know-how be interested in giving it a shot?
>> Are there technical reasons why it wouldn't be a good idea?
>>
>> Thank you.
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] scripting language interpreter

2010-05-01 Thread Gabor Grothendieck
On Sat, May 1, 2010 at 12:15 PM, Richard Hipp  wrote:
> On Sat, May 1, 2010 at 9:25 AM, Tim Romano  wrote:
>
>> I am aware that SQLite supports
>> loadable extensions, but would the SQLite architecture also permit the
>> integration of an interpreted scripting language?   An integrated scripting
>> language makes an already powerful database engine orders of magnitude more
>> useful especially when one is solving ad hoc data problems requiring very
>> rapid turnaround.
>>
>
> See http://www.sqlite.org/tclsqlite.html for the original.  SQLite began
> life as a TCL extension.  In fact, we often think of SQLite as a TCL
> extension that escaped into the wild.
>
> The integration between TCL and SQLite is very tight.  If you know where to
> look, you will see that many features of SQLite were designed specifically
> to support integration with TCL.
>
> An example of TCL using SQLite:
>
>  db eval {SELECT name FROM people WHERE personid=$personid} {
>     puts name=$name
>  }

If I understand correctly what is being illustrated here then the
sqldf package in R (http://sqldf.googlecode.com) has a similar
facility.  For example. from the R command line:

> # installs everything needed into R
> install.packages("sqldf")
>
> # loads everything needed into R workspace
> library(sqldf)
>
> # create R data frame (similar concept to an SQL table)
> DF <- data.frame(a = 1:3, b = 4:6)
>
> # the next statement notices that DF is an R data frame,
> # it automatically creates an sqlite data base in memory,
> # sets up table definition for DF by issuing create table stmt,
> # loads DF into the sqlite data base,
> # performs the query returning a new data frame
> # and deletes the sqlite data base
>
> sqldf("select * from DF where a < 3")
  a b
1 1 4
2 2 5

The actual interfaces between R and sqlite is in the DBI and RSQLite R
packages and sqldf sits on top of those.  The RSQLite package also
includes a copy of sqlite.  Installing and loading sqldf automatically
installs and loads its dependencies.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] STandard Deviation

2010-04-18 Thread Gabor Grothendieck
Not in sqlite itself but stddev_samp and stddev_pop are available in
the spatialite loadable extension.
Be careful since they interchanged sample and population in one
version of the extension.

On Sun, Apr 18, 2010 at 9:37 PM, Peter Haworth  wrote:
> Does Sqlite have a STDEV function?  Don;t see it listed under the core
> or aggregate functions.
> Thanks,
>
> Pete Haworth
>
>
>
>
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how to submit a file with sql to sqlite"

2010-04-18 Thread Gabor Grothendieck
On Sun, Apr 18, 2010 at 12:02 PM, Wensui Liu  wrote:
> dear listers,
> i am wondering if there is a way to submit a file with many sql
> statements, say several hundred lines,  to sqlite.
>
> thanks for your insight.

C:\tmp2>type a.sql
create table tab (a,b);
insert into tab values(1, 2);
insert into tab values(1, 2);
select * from tab;

C:\tmp2>sqlite3 a.db < a.sql
1|2
1|2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Northwind example database

2010-03-29 Thread Gabor Grothendieck
It would be nice if SQLite had a strptime-like function for things
like this and not just strftime.

On Mon, Mar 29, 2010 at 10:13 AM, Griggs, Donald
 wrote:
>
>
> On 27 Mar 2010, at 10:46am, GeoffW wrote:
>
>> Just for educational purposes I have been experimenting a little with
>> the Northwind Sqlite database contained on the sqlite official site.
>> Download link: http://download.vive.net/Northwind.zip.
>>
>> Am I misunderstanding here or are the dates in the wrong format for
>> sqlite within this converted database ?
>
>> Assuming it is wrong and not my understsanding, are there any easy
>> ways to get the dates reversed and corrected to sqlite order and
>> written back out to the database ?
>
>
> Perhaps this is better performed in the calling language, but the following 
> sql should reformat these dates.  Of course, you'd need to substitute and 
> repeat for the other fields.
>
>
> -- Reformat date from, e.g., '1/5/2010 12:00:00 AM'  to  '2010-01-05'
>
> update employees
>  set birthdate = replace (birthdate, ' 12:00:00 AM', '');
>
> update employees
>  set birthdate = '0' || birthdate
>  where substr(birthdate, 2,1) == '/';
>
>  update employees
>  set birthdate =
>       substr(birthdate, 1, 3)
>    || '0'
>    || substr(birthdate, 4,99)
>  where substr(birthdate, 5,1) == '/';
>
> -- Date should now be formatted as dd/mm/
> -- Now change to -mm-dd
>
> update employees
>  set birthdate =
>         substr(birthdate, 7,4)
>      || '-'
>      || substr(birthdate, 1,2)
>      || '-'
>      || substr(birthdate, 4,2);
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Get a specific sequence of rows...

2010-03-26 Thread Gabor Grothendieck
On Fri, Mar 26, 2010 at 5:00 AM, Fredrik Karlsson  wrote:
> Hi,
>
> I have a list of id:s stored in a field. I would now like to get some
> information from a table by these id:s, but exactly in this order. So,
> if I have a table
>
> 1 One
> 2 Two
> 3 Three
>
> and the sequence "3,1,2" stored somewhere, how do I get a neat list like
>
> Three
> One
> Two
>

Try this:

select *, 1 * (name = "Three") + 2 * (name = "One") + 3 * (name =
"Two") sorter from mytab where sorter > 0 order by sorter
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users