Hi Layla, As mentioned by my colleges, Trafodion supports two types of index access: index only scan, or index join scan. The first type scans the index only when the select list can be satisfied by the columns defined in the index. The second type joins the rows obtained from index lookup back tot the base table, when the index only scan is not possible. Of course, there is another way to get the data from the base table without using and indexes. The compiler considers these three options for every scan, when indexes are available and useful.
On the first two histograms, my impression is that the frequencies of the distinct value(s) are very high. Too high that index only or index join scans are not preferred. For example, value 5 occurred 50,000 times in the first case. To get the index-only scan, you probably need to remove several rows from the table, and insert several rows with value 4 in index column back to the table. and use predicate (<index-column> = 4). On Thu, Aug 6, 2015 at 10:58 AM, Dave Birdsall <[email protected]> wrote: > Hi Layla, > > I'm not sure what you're asking when you ask about "Index Type". The > reference to B-trees makes me think you're asking about the physical > implementation of the indexes. > > Secondary indexes in Trafodion are just HBase tables, no different from > Trafodion tables, and therefore have physical characteristics similar to > base tables. That is, they are log-structured merge files, just like the > base tables. > > When the Optimizer chooses to use an index, it may be doing so for a couple > of reasons. It might, for example, discover that an index has all the > columns referenced in a query, but in a more pleasing order (for example, > perhaps the rows can be accessed directly, while in the base table a full > scan might be required). This would result in an index-only scan. > Alternatively, it might discover that an index doesn't contain all the > columns, but does offer direct access given the predicates which again > would > require a larger scan on the base table. If this is the case, the Optimizer > will consider joining the index to the base table, and compare the cost of > the join plan against a base table scan. > > In recent days, some tweaks and optimizations have been added to indexes, > which you probably don't have in your test bed yet. > > For example, a recent check-in allows indexes to be in "aligned row format" > independent of the base table. "Aligned row format" is a format where all > the Trafodion columns for a given row are packaged in a single cell from an > HBase perspective. It is more efficient when a workload doesn't have > updates. And in a given application, what are updates to a base table might > become deletes and inserts into an index (for example, if it is > predominately the index columns that are updated). So it might be more > optimal from a path length perspective to use a different physical row > format for an index than a base table. > > Down the road I would not be surprised if we make further such tweaks. > > Dave > > > > -----Original Message----- > From: Martin, Layla (HP DualStudy) [mailto:[email protected]] > Sent: Thursday, August 6, 2015 6:27 AM > To: [email protected] > Subject: RE: Index Type > > Hi QiFan, > > Thanks for your help! > > Can you tell me which index type Trafodion uses? (This is important for me, > as I'm supposed to write a paper for university about this) > > It always uses intervals = frequency. > > Thanks > Layla > > Here is the output for f=1,5,50: > freqency = 1 > >>showstats for table testdata3 on s_key detail; > > Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID: > 94086134586213909 > > Hist ID: 1838554956 > Column(s): S_KEY > Total Rows: 50000 > Total UEC: 1 > Low Value: (5) > High Value: (5) > Intervals: 1 > > Number Rowcount UEC Boundary > ====== =========== =========== ====================================== > 0 0 0 (5) > 1 50000 1 (5) > > > --- SQL operation complete. > > > frequency = 5 > >>showstats for table testdata3 on s_key detail; > > Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID: > 134055579012223698 > > Hist ID: 647527840 > Column(s): S_KEY > Total Rows: 50000 > Total UEC: 5 > Low Value: (0) > High Value: (4) > Intervals: 5 > > Number Rowcount UEC Boundary > ====== =========== =========== ====================================== > 0 0 0 (0) > 1 9812 1 (0) > 2 10092 1 (1) > 3 10029 1 (2) > 4 9981 1 (3) > 5 10086 1 (4) > > > --- SQL operation complete. > > > frequency=50 > >>showstats for table testdata3 on s_key detail; > > Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID: > 134055579012309892 > > Hist ID: 1526840203 > Column(s): S_KEY > Total Rows: 50000 > Total UEC: 50 > Low Value: (0) > High Value: (49) > Intervals: 50 > > Number Rowcount UEC Boundary > ====== =========== =========== ====================================== > 0 0 0 (0) > 1 1041 1 (0) > 2 991 1 (1) > 3 1010 1 (2) > 4 963 1 (3) > 5 976 1 (4) > 6 969 1 (5) > 7 981 1 (6) > 8 1050 1 (7) > 9 986 1 (8) > 10 1057 1 (9) > 11 1033 1 (10) > 12 1016 1 (11) > 13 990 1 (12) > 14 980 1 (13) > 15 1021 1 (14) > 16 1009 1 (15) > 17 1004 1 (16) > 18 966 1 (17) > 19 998 1 (18) > 20 979 1 (19) > 21 1010 1 (20) > 22 1013 1 (21) > 23 974 1 (22) > 24 1023 1 (23) > 25 983 1 (24) > 26 1051 1 (25) > 27 1027 1 (26) > 28 1054 1 (27) > 29 991 1 (28) > 30 993 1 (29) > 31 993 1 (30) > 32 991 1 (31) > 33 931 1 (32) > 34 942 1 (33) > 35 995 1 (34) > 36 953 1 (35) > 37 945 1 (36) > 38 1015 1 (37) > 39 1019 1 (38) > 40 956 1 (39) > 41 963 1 (40) > 42 1004 1 (41) > 43 1017 1 (42) > 44 1047 1 (43) > 45 1021 1 (44) > 46 1019 1 (45) > 47 1013 1 (46) > 48 986 1 (47) > 49 1056 1 (48) > 50 995 1 (49) > > > --- SQL operation complete. > > Layla Martin > HP DualStudy - Telefon +49 7031 4504682 - [email protected] > > -----Original Message----- > From: Qifan Chen [mailto:[email protected]] > Sent: Donnerstag, 6. August 2015 14:30 > To: dev > Subject: Re: Index Type > > Hi Layla, > > Glad to hear that you were able to see index plans for some of your data. > > Can you do a showstats for the index column and find out the interval(s) in > which values with frequency of 1 and 5? > > The compiler relies on the stats to figure out the cardinality and it is > possible these extreme low frequency values are buried in some of the > intervals and their low frequencies are not known spcifically. If that is > the case, we may need to increase # of intervals (say to 100 or even > higher) so that the intervals correctly represent the low frequency values. > The default is 50 to reduce the memory usage. > > BTW, the compiler does recognize an opposite case where some value has very > high frequency (say 10% of the total rows) and applies special joins (skew > buster) when necessary. > > showstats for table <T> on <column> detail; > > update statistics for table <T> on <column> generate <n> intervals; > > > Thanks --Qifan > > On Thu, Aug 6, 2015 at 2:25 AM, Martin, Layla (HP DualStudy) < > [email protected]> wrote: > > > Hi, > > > > > > > > Sorry for just another question … > > > > > > > > I got another question regarding Trafodion indexes. Which type of > > index do you use? > > > > > > > > I conducted a test with 50000 table rows and a varying frequency > > (number of different entries in the indexed column). > > > > > > > > This test has shown the best results (best benefit over not using an > > index) for frequencies 100 and 1000 (I’ve tested 1, 5, 50, 100, 1000, > > 10000 and 50000, the optimizer didn’t use the index for 1 and 5). > > > > I thought that B-Trees work best with a frequency close to log(number > > of rows), but the results have been best close to sqrt(number of rows). > > > > > > > > Do you have an explanation for that? > > > > I didn’t find any information on that in the docu/wiki or on Git. > > > > > > > > Best regards > > > > Layla > > > > > > > > *Layla Martin* > > HP DualStudy > > > > Hewlett-Packard GmbH > > > > Telefon +49 7031 4504682 > > > > *[email protected] <[email protected]>* > > > > Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy > > > > [image: Description: Description: > > http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogo > > Small.png] > > > > Follow us on: [image: facebook] <https://www.facebook.com/DualStudy.hp> > > [image: > > twitter] <http://twitter.com/hpdualstudy> [image: youtube] > > <http://www.youtube.com/user/hpDualStudy> > > > > Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard > > Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst > > Reichart Vorsitzender des Aufsichtsrats: Jörg Menno Harms Sitz der > > Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081 > > WEEE-Reg.-Nr. DE 30409072 > > > > > > > > > > > > > > -- > Regards, --Qifan > -- Regards, --Qifan
