Thanks Jolyon, Silly me, I thought this was going to be simple.
Add Index, don't add index, it's a Boolean decision, I never expected NULL as an answer :) Steve Peacocke Mobile: +64 220 612-611 Linkedin Professional Profile<http://nz.linkedin.com/pub/steve-peacocke/1/a06/489> On Mon, Mar 31, 2014 at 8:56 AM, Jolyon Smith <[email protected]> wrote: > I don't think you can adopt a general rule for all boolean type conditions > in data. In the two example fields you cite, for example, I can see that > there is a potential difference in the nature of the booleans involved. > > ActiveRecord - looks like something that could change over time. A record > that was active may become inactive and I further speculate that there will > over time be far more inactive records than active ones. > > AccountTransactionType - looks like something that is fixed. The type of > a transaction seems unlikely to change once that transaction has been > recorded. You might call this a "static" boolean, as opposed to the more > "dynamic" nature of the previous example. > > > Of course, more specific domain knowledge may reveal these assumptions to > be invalid, but you get the general idea.... the characteristics of a > particular datum go beyond it's simple data type and those characteristics > in turn determine the most appropriate implementation (which in turn will > depend on whether the dominant context is OLTP or OLAP - i.e. efficiency of > creating/modifying data vs efficiency of queries). > > > In the case of "static" booleans for example, you might consider creating > separate tables for records of different values in this field. For > convenience of querying all records you can of course project a view which > unions the two (or more) tables involved, with a derived, virtual column > containing the discriminating field value. This also opens up the > possibility that the most efficient indexes for rows of a certain type > (i.e. now table) may well be different than those for the other. i.e. the > way you work with Income transactions might benefit from different indexes > than Expense transactions. > > On the other hand, the way you work with income and expense transactions > may mean that you are better off having indexes operating over ALL > transactions, regardless of Income/Expense type. > > See what I mean about "the best way" being dependent on far more than just > the data type ? > > And there's still more to it than that... > > w.r.t index selectivity, I am not convinced that the 1 / # of distinct > values metric is a particularly reliable measure. It surely assumes an > even distribution of distinct values across the data set ? > > i.e. if you have 100,000 records and they have a column where 50,000 rows > have one value and 50,000 have another, then yes, the efficiency and thus > the utility of any index on that value is going to be negligible (but then, > no better than having no index isn't actually *worse*, is it ? Although > there will be some overhead introduced in maintaining the index, though I > doubt this will itself be hugely significant). > > On the other hand, if only 1,000 of those 100,000 records have one value > and the remaining 99,000 have another, AND if your application most often > queries that table to select those in the smaller subset (the 1,000) then > whilst an index may not be of any benefit for querying the 99,000, it > surely will provide benefit for those queries that select the 1,000 (or > from among them), a benefit which *might* be worth the overhead of > maintaining that index even though it provides little/no benefit for the > handful/minority of queries that work with the 99,000 records ? > > > The bottom line is, there is no shortcut for properly understanding your > data and the way your application(s) work(s) with that data for correctly > tuning your database structure and metadata for optimal performance. > > :) > > > On 30 March 2014 15:19, Steve Peacocke <[email protected]> wrote: > >> Hi all, >> >> I'm playing around with a Firebird database and wanted to know from you >> DB experts out there how you handle booleans in a table. >> >> These could be as simple as >> ActiveRecord (Y/N) >> AccountTransactionType (I/E) - (Income or Expense) >> >> That last I would normally think would be "Income (Y/N)" so that would be >> a boolean too. >> >> My understanding is that this will never be indexed, even if you >> specifically add an index to it. So how do you handle it. There may be >> several boolean fields in a table definition. >> >> As these tables c an contain several hundred thousand records, this could >> potentially slow down any query to say total all records last 3 years where >> Active and Income - as the only index would then be on the date field, >> there is a possibility that this could potentially be a very slow query. >> >> I've heard of others creating another table to create, say, non-Avtive >> record ID's, but this one table could have several booleans, therefore >> creating several new tables (combining then into a single table with the >> field name would cause the same problem). >> >> Any thoughts? >> >> Steve Peacocke >> Mobile: +64 220 612-611 >> Linkedin Professional >> Profile<http://nz.linkedin.com/pub/steve-peacocke/1/a06/489> >> >> _______________________________________________ >> NZ Borland Developers Group - Delphi mailing list >> Post: [email protected] >> Admin: http://delphi.org.nz/mailman/listinfo/delphi >> Unsubscribe: send an email to [email protected] with >> Subject: unsubscribe >> > > > _______________________________________________ > NZ Borland Developers Group - Delphi mailing list > Post: [email protected] > Admin: http://delphi.org.nz/mailman/listinfo/delphi > Unsubscribe: send an email to [email protected] with > Subject: unsubscribe >
_______________________________________________ NZ Borland Developers Group - Delphi mailing list Post: [email protected] Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to [email protected] with Subject: unsubscribe
