Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
This is only to see a general trend how particular values change by age and
by gender in our patient group.
That is about 12000 patients in our practice.
The pattern won't change much over the years, so one snapshot for that
value will do.

RBS

On Sun, 22 Sep 2019, 22:32 Keith Medcalf,  wrote:

>
> Well, no.  It depends what the data is and what you are measuring.  For
> the "age" in the "patients" record to have any meaning, there can only be
> one sample and it must have been taken "now", and the results are only
> valid when the query is run "now".
>
> That is, if you have one (or more) samples per patient in the database,
> then the results classified by the present (at the time of the query) age
> of the patients are meaningless.  When you run the same query next year
> with exactly the same data, you will get an entirely different answer
> because all the patients will be a year older but otherwise nothing will
> have changed.  In order for the results to be constant you need to know the
> age of the patients at the time the sample was taken and use that as the
> classifier -- then it matters not when you run the actual query, the
> results will be constant for constant data.  Plus, of course, if you have
> multiple num_values per patient that implies different samples from the
> same patient taken at different times, so the fact that the patient is now
> 70 years old when there is one sample that was taken when the patient was 5
> is not comparable to a patient now 70 who had his one sample taken last
> week.  Nor if every patient had a sample taken each year and recorded in
> the database, does the present age of the patient provide any meaningful
> information at all.  (And of course we are assuming that the sample data is
> for Released/Final results and not for interim or retests of the same
> sample).
>
> Of course, the "patients" might not actually be "patients" but rather
> "samples", and the num_values are the final results of some analysis for
> that sample.  In which case if "create unique index results on num_values
> (id, text_term)" fails due to a constraint violation (more than one
> analysis result per sample) then the data contains errors that need to be
> resolved before meaningful results can be obtained.
>
> Typical lab databases will have a samplepoint (the thing from which the
> sample was collected).  There may then be multiple samples taken at
> different times, and on those samples multiple analysis may be run giving
> exactly one final result each.  Generally the analysis timestamp is not
> very useful (but it is usually recorded anyway).
>
> That is, there is exactly one final result per analysis per sample per
> samplepoint.
>
> >-Original Message-
> >From: sqlite-users  On
> >Behalf Of Bart Smissaert
> >Sent: Sunday, 22 September, 2019 13:51
> >To: General Discussion of SQLite Database  >us...@mailinglists.sqlite.org>
> >Subject: Re: [sqlite] What is wrong with this SQL?
> >
> >Thanks, interesting stuff! Will study this.
> >Don't comprehend it fully, but basically query 3 is no good then?
> >
> >RBS
> >
> >On Sun, 22 Sep 2019, 20:36 Keith Medcalf,  wrote:
> >
> >>
> >> On Sunday, 22 September, 2019 12:36, Bart Smissaert <
> >> bart.smissa...@gmail.com> wrote:
> >>
> >> >Have tested this now and no 1 is indeed fastest, with no 2 not far
> >> >behind.  3 is quite a bit slower and for some strange reason gives
> >> >a different value for the females aged 50.
> >> >Will look into this.
> >>
> >> Query 1 can run with either Patients or Num_values in the outer loop
> >> depending on the size of the tables, index availability, and any
> >available
> >> statistics.
> >>
> >> Query 2 requires that Patients be in the outer loop and that an index
> >> exist on num_values id and term_text (in any order as the first two
> >columns
> >> of the index) for efficient lookups.  It will do only one lookup into
> >the
> >> num_values per patients (one of the left joins will always fail, so it
> >will
> >> be short-circuited, but this short-circuiting will still take time to
> >> process) but may join multiple matching num_values records per patients
> >> record.
> >>
> >> Query 3 is the same as Query 2 but has the additional restriction that
> >it
> >> will only handle one num_values per patients, and it will always do two
> >> executions of the subquery per patients record (so it will always be
> >much
> &

Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Keith Medcalf

Well, no.  It depends what the data is and what you are measuring.  For the 
"age" in the "patients" record to have any meaning, there can only be one 
sample and it must have been taken "now", and the results are only valid when 
the query is run "now".

That is, if you have one (or more) samples per patient in the database, then 
the results classified by the present (at the time of the query) age of the 
patients are meaningless.  When you run the same query next year with exactly 
the same data, you will get an entirely different answer because all the 
patients will be a year older but otherwise nothing will have changed.  In 
order for the results to be constant you need to know the age of the patients 
at the time the sample was taken and use that as the classifier -- then it 
matters not when you run the actual query, the results will be constant for 
constant data.  Plus, of course, if you have multiple num_values per patient 
that implies different samples from the same patient taken at different times, 
so the fact that the patient is now 70 years old when there is one sample that 
was taken when the patient was 5 is not comparable to a patient now 70 who had 
his one sample taken last week.  Nor if every patient had a sample taken each 
year and recorded in the database, does the present age of the patient provide 
any meaningful information at all.  (And of course we are assuming that the 
sample data is for Released/Final results and not for interim or retests of the 
same sample).

Of course, the "patients" might not actually be "patients" but rather 
"samples", and the num_values are the final results of some analysis for that 
sample.  In which case if "create unique index results on num_values (id, 
text_term)" fails due to a constraint violation (more than one analysis result 
per sample) then the data contains errors that need to be resolved before 
meaningful results can be obtained.

Typical lab databases will have a samplepoint (the thing from which the sample 
was collected).  There may then be multiple samples taken at different times, 
and on those samples multiple analysis may be run giving exactly one final 
result each.  Generally the analysis timestamp is not very useful (but it is 
usually recorded anyway).

That is, there is exactly one final result per analysis per sample per 
samplepoint.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Bart Smissaert
>Sent: Sunday, 22 September, 2019 13:51
>To: General Discussion of SQLite Database us...@mailinglists.sqlite.org>
>Subject: Re: [sqlite] What is wrong with this SQL?
>
>Thanks, interesting stuff! Will study this.
>Don't comprehend it fully, but basically query 3 is no good then?
>
>RBS
>
>On Sun, 22 Sep 2019, 20:36 Keith Medcalf,  wrote:
>
>>
>> On Sunday, 22 September, 2019 12:36, Bart Smissaert <
>> bart.smissa...@gmail.com> wrote:
>>
>> >Have tested this now and no 1 is indeed fastest, with no 2 not far
>> >behind.  3 is quite a bit slower and for some strange reason gives
>> >a different value for the females aged 50.
>> >Will look into this.
>>
>> Query 1 can run with either Patients or Num_values in the outer loop
>> depending on the size of the tables, index availability, and any
>available
>> statistics.
>>
>> Query 2 requires that Patients be in the outer loop and that an index
>> exist on num_values id and term_text (in any order as the first two
>columns
>> of the index) for efficient lookups.  It will do only one lookup into
>the
>> num_values per patients (one of the left joins will always fail, so it
>will
>> be short-circuited, but this short-circuiting will still take time to
>> process) but may join multiple matching num_values records per patients
>> record.
>>
>> Query 3 is the same as Query 2 but has the additional restriction that
>it
>> will only handle one num_values per patients, and it will always do two
>> executions of the subquery per patients record (so it will always be
>much
>> slower than the others).
>>
>> Your observed difference will be because the Patients:Num_values where
>> Num_values is constrained to have term_text = 'Haemoglobin estimation'
>is
>> not a 1:1 relationship.  That is:
>>
>>   select patients.id, count(*)
>> from patients, num_values
>>where patients.id == num_values.id
>>  and num_values.term_text = 'Haemoglobin estimation'
>> group by patients.id
>>   having count(*) > 1;
>>
>> returns at least one row.  And those patients.id probably have age 50
>and
>> are Female ...
>>
>> >On Sun, 22 Sep 2019, 18:37 Keith Medcalf,  wrote:
>>

Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Thanks, interesting stuff! Will study this.
Don't comprehend it fully, but basically query 3 is no good then?

RBS

On Sun, 22 Sep 2019, 20:36 Keith Medcalf,  wrote:

>
> On Sunday, 22 September, 2019 12:36, Bart Smissaert <
> bart.smissa...@gmail.com> wrote:
>
> >Have tested this now and no 1 is indeed fastest, with no 2 not far
> >behind.  3 is quite a bit slower and for some strange reason gives
> >a different value for the females aged 50.
> >Will look into this.
>
> Query 1 can run with either Patients or Num_values in the outer loop
> depending on the size of the tables, index availability, and any available
> statistics.
>
> Query 2 requires that Patients be in the outer loop and that an index
> exist on num_values id and term_text (in any order as the first two columns
> of the index) for efficient lookups.  It will do only one lookup into the
> num_values per patients (one of the left joins will always fail, so it will
> be short-circuited, but this short-circuiting will still take time to
> process) but may join multiple matching num_values records per patients
> record.
>
> Query 3 is the same as Query 2 but has the additional restriction that it
> will only handle one num_values per patients, and it will always do two
> executions of the subquery per patients record (so it will always be much
> slower than the others).
>
> Your observed difference will be because the Patients:Num_values where
> Num_values is constrained to have term_text = 'Haemoglobin estimation' is
> not a 1:1 relationship.  That is:
>
>   select patients.id, count(*)
> from patients, num_values
>where patients.id == num_values.id
>  and num_values.term_text = 'Haemoglobin estimation'
> group by patients.id
>   having count(*) > 1;
>
> returns at least one row.  And those patients.id probably have age 50 and
> are Female ...
>
> >On Sun, 22 Sep 2019, 18:37 Keith Medcalf,  wrote:
> >
> >>
> >> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
> >> bart.smissa...@gmail.com> wrote:
> >>
> >> >Have this query:
> >>
> >> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS
> >AGE,
> >> >   AVG(BM.numeric_value) AS avg_Hb_Male,
> >> >   AVG(BF.numeric_value) as avg_Hb_Female
> >> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
> >> 'Male')
> >> >INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> >> 'Female')
> >> >WHERE BF.term_text = 'Haemoglobin estimation' and
> >> >  BM.term_text = 'Haemoglobin estimation'
> >> >GROUP BY AGE
> >> >HAVING AGE > 18 and AGE < 98
> >> >ORDER BY AGE ASC
> >>
> >> >Which always used to run fine and return rows. There is no error, but
> >> >there are no returned rows.
> >>
> >> What changed between when it was "working" and when it was "not
> >working"?
> >> As far as I can see it should never have been "working" because the
> >eqijoin
> >> constraints could never be satisfied except by a Patients that was both
> >> Male and Female at the same time in the same record (which is
> >impossible).
> >>
> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> >> Age,
> >>  avg(case when P.Sex == 'Male' then V.numeric_value else null
> >end)
> >> as avg_Hb_Male,
> >>  avg(case when P.Sex == 'Female' then V.numeric_value else null
> >> end) as avg_Hb_Female
> >> from Patients P, num_values V
> >>where P.id == V.id
> >>  and V.term_text == 'Haemoglobin estimation'
> >> group by Age
> >>   having Age between 19 and 97
> >> order by Age asc;
> >>
> >> or, using outer joins:
> >>
> >>select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
> >as
> >> Age,
> >>   avg(BM.numeric_value) as avg_Hb_Male,
> >>   avg(BF.numeric_value) as avg_Hb_Female
> >>  from Patients as P
> >> left join num_values as BM
> >>on P.id == BM.id
> >>   and P.Sex == 'Male'
> >>   and BM.term_text == 'Haemoglobin estimation'
> >> left join num_values as BF
> >>on P.id == BF.id
> >>   and P.Sex == 'Female'
> >>   and BF.term_text == 'Haemoglobin estimation'
> >>  group by Age
> >>having Age between 19 and 97
> >>  order by Age asc;
> >>
> >> or, using correlated subqueries:
> >>
> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> >> Age,
> >>  avg((
> >>   select numeric_value
> >> from num_values as V
> >>where P.id == V.id
> >>  and P.Sex == 'Male'
> >>  and V.term_text == 'Haemoglobin estimation'
> >> )) as avg_Hb_Male,
> >>  avg((
> >>   select numeric_value
> >> from num_values as V
> >>where P.id == V.id
> >>  and P.Sex == 'Female'
> >>  and V.term_text == 'Haemoglobin estimation'
> >> )) as avg_Hb_Female
> >> from Patients as P
> >> group by Age
> >>   having Age between 19 and 97
> >> order by Age asc;
> >>
> 

Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Keith Medcalf

On Sunday, 22 September, 2019 12:36, Bart Smissaert  
wrote:

>Have tested this now and no 1 is indeed fastest, with no 2 not far
>behind.  3 is quite a bit slower and for some strange reason gives 
>a different value for the females aged 50.
>Will look into this.

Query 1 can run with either Patients or Num_values in the outer loop depending 
on the size of the tables, index availability, and any available statistics.  

Query 2 requires that Patients be in the outer loop and that an index exist on 
num_values id and term_text (in any order as the first two columns of the 
index) for efficient lookups.  It will do only one lookup into the num_values 
per patients (one of the left joins will always fail, so it will be 
short-circuited, but this short-circuiting will still take time to process) but 
may join multiple matching num_values records per patients record.

Query 3 is the same as Query 2 but has the additional restriction that it will 
only handle one num_values per patients, and it will always do two executions 
of the subquery per patients record (so it will always be much slower than the 
others).

Your observed difference will be because the Patients:Num_values where 
Num_values is constrained to have term_text = 'Haemoglobin estimation' is not a 
1:1 relationship.  That is:

  select patients.id, count(*) 
from patients, num_values
   where patients.id == num_values.id
 and num_values.term_text = 'Haemoglobin estimation'
group by patients.id
  having count(*) > 1;

returns at least one row.  And those patients.id probably have age 50 and are 
Female ...

>On Sun, 22 Sep 2019, 18:37 Keith Medcalf,  wrote:
>
>>
>> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
>> bart.smissa...@gmail.com> wrote:
>>
>> >Have this query:
>>
>> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS
>AGE,
>> >   AVG(BM.numeric_value) AS avg_Hb_Male,
>> >   AVG(BF.numeric_value) as avg_Hb_Female
>> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
>> 'Male')
>> >INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
>> 'Female')
>> >WHERE BF.term_text = 'Haemoglobin estimation' and
>> >  BM.term_text = 'Haemoglobin estimation'
>> >GROUP BY AGE
>> >HAVING AGE > 18 and AGE < 98
>> >ORDER BY AGE ASC
>>
>> >Which always used to run fine and return rows. There is no error, but
>> >there are no returned rows.
>>
>> What changed between when it was "working" and when it was "not
>working"?
>> As far as I can see it should never have been "working" because the
>eqijoin
>> constraints could never be satisfied except by a Patients that was both
>> Male and Female at the same time in the same record (which is
>impossible).
>>
>>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
>> Age,
>>  avg(case when P.Sex == 'Male' then V.numeric_value else null
>end)
>> as avg_Hb_Male,
>>  avg(case when P.Sex == 'Female' then V.numeric_value else null
>> end) as avg_Hb_Female
>> from Patients P, num_values V
>>where P.id == V.id
>>  and V.term_text == 'Haemoglobin estimation'
>> group by Age
>>   having Age between 19 and 97
>> order by Age asc;
>>
>> or, using outer joins:
>>
>>select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
>as
>> Age,
>>   avg(BM.numeric_value) as avg_Hb_Male,
>>   avg(BF.numeric_value) as avg_Hb_Female
>>  from Patients as P
>> left join num_values as BM
>>on P.id == BM.id
>>   and P.Sex == 'Male'
>>   and BM.term_text == 'Haemoglobin estimation'
>> left join num_values as BF
>>on P.id == BF.id
>>   and P.Sex == 'Female'
>>   and BF.term_text == 'Haemoglobin estimation'
>>  group by Age
>>having Age between 19 and 97
>>  order by Age asc;
>>
>> or, using correlated subqueries:
>>
>>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
>> Age,
>>  avg((
>>   select numeric_value
>> from num_values as V
>>where P.id == V.id
>>  and P.Sex == 'Male'
>>  and V.term_text == 'Haemoglobin estimation'
>> )) as avg_Hb_Male,
>>  avg((
>>   select numeric_value
>> from num_values as V
>>where P.id == V.id
>>  and P.Sex == 'Female'
>>  and V.term_text == 'Haemoglobin estimation'
>> )) as avg_Hb_Female
>> from Patients as P
>> group by Age
>>   having Age between 19 and 97
>> order by Age asc;
>>
>> The first one is likely to be the fastest to execute since it does the
>> least record lookups and the optimizer has the best chance at
>optimizing
>> the query.
>>
>>
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>___

Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Have tested this now and no 1 is indeed fastest, with no 2 not far behind.
3 is quite a bit slower and for some strange reason gives a different value
for the females aged 50.
Will look into this.

RBS

On Sun, 22 Sep 2019, 18:37 Keith Medcalf,  wrote:

>
> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
> bart.smissa...@gmail.com> wrote:
>
> >Have this query:
>
> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
> >   AVG(BM.numeric_value) AS avg_Hb_Male,
> >   AVG(BF.numeric_value) as avg_Hb_Female
> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
> 'Male')
> >INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> 'Female')
> >WHERE BF.term_text = 'Haemoglobin estimation' and
> >  BM.term_text = 'Haemoglobin estimation'
> >GROUP BY AGE
> >HAVING AGE > 18 and AGE < 98
> >ORDER BY AGE ASC
>
> >Which always used to run fine and return rows. There is no error, but
> >there are no returned rows.
>
> What changed between when it was "working" and when it was "not working"?
> As far as I can see it should never have been "working" because the eqijoin
> constraints could never be satisfied except by a Patients that was both
> Male and Female at the same time in the same record (which is impossible).
>
>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>  avg(case when P.Sex == 'Male' then V.numeric_value else null end)
> as avg_Hb_Male,
>  avg(case when P.Sex == 'Female' then V.numeric_value else null
> end) as avg_Hb_Female
> from Patients P, num_values V
>where P.id == V.id
>  and V.term_text == 'Haemoglobin estimation'
> group by Age
>   having Age between 19 and 97
> order by Age asc;
>
> or, using outer joins:
>
>select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>   avg(BM.numeric_value) as avg_Hb_Male,
>   avg(BF.numeric_value) as avg_Hb_Female
>  from Patients as P
> left join num_values as BM
>on P.id == BM.id
>   and P.Sex == 'Male'
>   and BM.term_text == 'Haemoglobin estimation'
> left join num_values as BF
>on P.id == BF.id
>   and P.Sex == 'Female'
>   and BF.term_text == 'Haemoglobin estimation'
>  group by Age
>having Age between 19 and 97
>  order by Age asc;
>
> or, using correlated subqueries:
>
>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>  avg((
>   select numeric_value
> from num_values as V
>where P.id == V.id
>  and P.Sex == 'Male'
>  and V.term_text == 'Haemoglobin estimation'
> )) as avg_Hb_Male,
>  avg((
>   select numeric_value
> from num_values as V
>where P.id == V.id
>  and P.Sex == 'Female'
>  and V.term_text == 'Haemoglobin estimation'
> )) as avg_Hb_Female
> from Patients as P
> group by Age
>   having Age between 19 and 97
> order by Age asc;
>
> The first one is likely to be the fastest to execute since it does the
> least record lookups and the optimizer has the best chance at optimizing
> the query.
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Thanks.
What changed was probably the SQLCipher version for Android. Changed to the
64 bits version. It definitely worked fine before.
Will try your first query.

RBS

On Sun, 22 Sep 2019, 18:37 Keith Medcalf,  wrote:

>
> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
> bart.smissa...@gmail.com> wrote:
>
> >Have this query:
>
> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
> >   AVG(BM.numeric_value) AS avg_Hb_Male,
> >   AVG(BF.numeric_value) as avg_Hb_Female
> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
> 'Male')
> >INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> 'Female')
> >WHERE BF.term_text = 'Haemoglobin estimation' and
> >  BM.term_text = 'Haemoglobin estimation'
> >GROUP BY AGE
> >HAVING AGE > 18 and AGE < 98
> >ORDER BY AGE ASC
>
> >Which always used to run fine and return rows. There is no error, but
> >there are no returned rows.
>
> What changed between when it was "working" and when it was "not working"?
> As far as I can see it should never have been "working" because the eqijoin
> constraints could never be satisfied except by a Patients that was both
> Male and Female at the same time in the same record (which is impossible).
>
>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>  avg(case when P.Sex == 'Male' then V.numeric_value else null end)
> as avg_Hb_Male,
>  avg(case when P.Sex == 'Female' then V.numeric_value else null
> end) as avg_Hb_Female
> from Patients P, num_values V
>where P.id == V.id
>  and V.term_text == 'Haemoglobin estimation'
> group by Age
>   having Age between 19 and 97
> order by Age asc;
>
> or, using outer joins:
>
>select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>   avg(BM.numeric_value) as avg_Hb_Male,
>   avg(BF.numeric_value) as avg_Hb_Female
>  from Patients as P
> left join num_values as BM
>on P.id == BM.id
>   and P.Sex == 'Male'
>   and BM.term_text == 'Haemoglobin estimation'
> left join num_values as BF
>on P.id == BF.id
>   and P.Sex == 'Female'
>   and BF.term_text == 'Haemoglobin estimation'
>  group by Age
>having Age between 19 and 97
>  order by Age asc;
>
> or, using correlated subqueries:
>
>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> Age,
>  avg((
>   select numeric_value
> from num_values as V
>where P.id == V.id
>  and P.Sex == 'Male'
>  and V.term_text == 'Haemoglobin estimation'
> )) as avg_Hb_Male,
>  avg((
>   select numeric_value
> from num_values as V
>where P.id == V.id
>  and P.Sex == 'Female'
>  and V.term_text == 'Haemoglobin estimation'
> )) as avg_Hb_Female
> from Patients as P
> group by Age
>   having Age between 19 and 97
> order by Age asc;
>
> The first one is likely to be the fastest to execute since it does the
> least record lookups and the optimizer has the best chance at optimizing
> the query.
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Keith Medcalf

On Sunday, 22 September, 2019 04:10, Bart Smissaert  
wrote:

>Have this query:

>SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
>   AVG(BM.numeric_value) AS avg_Hb_Male,
>   AVG(BF.numeric_value) as avg_Hb_Female
>FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX = 'Male')
>INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX = 'Female')
>WHERE BF.term_text = 'Haemoglobin estimation' and
>  BM.term_text = 'Haemoglobin estimation'
>GROUP BY AGE
>HAVING AGE > 18 and AGE < 98
>ORDER BY AGE ASC

>Which always used to run fine and return rows. There is no error, but
>there are no returned rows. 

What changed between when it was "working" and when it was "not working"?  As 
far as I can see it should never have been "working" because the eqijoin 
constraints could never be satisfied except by a Patients that was both Male 
and Female at the same time in the same record (which is impossible).

  select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as Age,
 avg(case when P.Sex == 'Male' then V.numeric_value else null end) as 
avg_Hb_Male,
 avg(case when P.Sex == 'Female' then V.numeric_value else null end) as 
avg_Hb_Female
from Patients P, num_values V
   where P.id == V.id
 and V.term_text == 'Haemoglobin estimation'
group by Age
  having Age between 19 and 97
order by Age asc;

or, using outer joins:

   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as Age,
  avg(BM.numeric_value) as avg_Hb_Male,
  avg(BF.numeric_value) as avg_Hb_Female
 from Patients as P
left join num_values as BM
   on P.id == BM.id
  and P.Sex == 'Male'
  and BM.term_text == 'Haemoglobin estimation'
left join num_values as BF
   on P.id == BF.id
  and P.Sex == 'Female'
  and BF.term_text == 'Haemoglobin estimation'
 group by Age
   having Age between 19 and 97
 order by Age asc;

or, using correlated subqueries:

  select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as Age,
 avg((
  select numeric_value
from num_values as V
   where P.id == V.id
 and P.Sex == 'Male'
 and V.term_text == 'Haemoglobin estimation'
)) as avg_Hb_Male,
 avg((
  select numeric_value
from num_values as V
   where P.id == V.id
 and P.Sex == 'Female'
 and V.term_text == 'Haemoglobin estimation'
)) as avg_Hb_Female
from Patients as P
group by Age
  having Age between 19 and 97
order by Age asc;

The first one is likely to be the fastest to execute since it does the least 
record lookups and the optimizer has the best chance at optimizing the query.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Ok, what should the query be?

RBS

On Sun, 22 Sep 2019, 15:42 Clemens Ladisch,  wrote:

> Bart Smissaert wrote:
> > I did try left joins, but no data returned.
>
> All filters for outer-joined rows must be specified in the JOIN itself;
> in the WHERE clause, NULL values would make the comparison fail.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Clemens Ladisch
Bart Smissaert wrote:
> I did try left joins, but no data returned.

All filters for outer-joined rows must be specified in the JOIN itself;
in the WHERE clause, NULL values would make the comparison fail.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Thanks, I did try left joins, but no data returned.

RBS


On Sun, Sep 22, 2019 at 12:22 PM Richard Damon 
wrote:

> On 9/22/19 6:10 AM, Bart Smissaert wrote:
> > Have this query:
> >
> > SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS
> AGE,
> >AVG(BM.numeric_value) AS avg_Hb_Male,
> >AVG(BF.numeric_value) as avg_Hb_Female
> > FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
> 'Male')
> > INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> > 'Female')
> > WHERE BF.term_text = 'Haemoglobin estimation' and
> >   BM.term_text = 'Haemoglobin estimation'
> > GROUP BY AGE
> > HAVING AGE > 18 and AGE < 98
> > ORDER BY AGE ASC
> >
> > Which always used to run fine and return rows. There is no error, but
> there
> > are no returned rows. I have checked all the where conditions and they
> are
> > all fine.
> > When I leave out either of the NUM_VALUES joins (so either male or female
> > patients) then it
> > runs fine and returns data as expected.
> >
> > Something like this runs fine as well and returns the right data:
> >
> > select p.sex, round(avg(n.numeric_value), 2) as avg_Hb
> > from patients p inner join num_values n on(p.id = n.id)
> > where n.term_text = 'Haemoglobin estimation'
> > group by p.sex
> >
> > Any idea what could be the problem here?
> > Running version 3.22.0.
> >
> > RBS
>
> If I am reading that right, the only way that query should have worked
> at all is if INNER JOIN wasn't being supported and had been converted to
> a LEFT JOIN.
>
> INNER JOIN only returns records that match BOTH tables, so to pass the
> first INNER JOIN the record need P.SEX = 'Male', and then to pass the
> second it would need
>
> P.SEX = 'Female' at the same time.
>
> --
> Richard Damon
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Richard Damon
On 9/22/19 6:10 AM, Bart Smissaert wrote:
> Have this query:
>
> SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
>AVG(BM.numeric_value) AS avg_Hb_Male,
>AVG(BF.numeric_value) as avg_Hb_Female
> FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX = 'Male')
> INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> 'Female')
> WHERE BF.term_text = 'Haemoglobin estimation' and
>   BM.term_text = 'Haemoglobin estimation'
> GROUP BY AGE
> HAVING AGE > 18 and AGE < 98
> ORDER BY AGE ASC
>
> Which always used to run fine and return rows. There is no error, but there
> are no returned rows. I have checked all the where conditions and they are
> all fine.
> When I leave out either of the NUM_VALUES joins (so either male or female
> patients) then it
> runs fine and returns data as expected.
>
> Something like this runs fine as well and returns the right data:
>
> select p.sex, round(avg(n.numeric_value), 2) as avg_Hb
> from patients p inner join num_values n on(p.id = n.id)
> where n.term_text = 'Haemoglobin estimation'
> group by p.sex
>
> Any idea what could be the problem here?
> Running version 3.22.0.
>
> RBS

If I am reading that right, the only way that query should have worked
at all is if INNER JOIN wasn't being supported and had been converted to
a LEFT JOIN.

INNER JOIN only returns records that match BOTH tables, so to pass the
first INNER JOIN the record need P.SEX = 'Male', and then to pass the
second it would need

P.SEX = 'Female' at the same time. 

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] What is wrong with this SQL?

2019-09-22 Thread Bart Smissaert
Have this query:

SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
   AVG(BM.numeric_value) AS avg_Hb_Male,
   AVG(BF.numeric_value) as avg_Hb_Female
FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX = 'Male')
INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
'Female')
WHERE BF.term_text = 'Haemoglobin estimation' and
  BM.term_text = 'Haemoglobin estimation'
GROUP BY AGE
HAVING AGE > 18 and AGE < 98
ORDER BY AGE ASC

Which always used to run fine and return rows. There is no error, but there
are no returned rows. I have checked all the where conditions and they are
all fine.
When I leave out either of the NUM_VALUES joins (so either male or female
patients) then it
runs fine and returns data as expected.

Something like this runs fine as well and returns the right data:

select p.sex, round(avg(n.numeric_value), 2) as avg_Hb
from patients p inner join num_values n on(p.id = n.id)
where n.term_text = 'Haemoglobin estimation'
group by p.sex

Any idea what could be the problem here?
Running version 3.22.0.

RBS
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users