RE: Optimization for faster select...

Jared Still Sat, 20 Dec 2003 20:21:38 -0800

Daniel,

Though you provided an execution path via explain plan, that
is really not very useful without some context.


You are outputting 165,000 rows, but you don't say how big
the table is from which they are being retrieved, and whether
or not the table is partitioned.

Using an index is not always a good idea, it depends on 
a number of things.  If your table or partition has an
average row length of 75-80 bytes, you will get about 100 rows
per block, requiring at least 1650 blocks.  Could be more, it
depends on the data you're selecting. If the data is stored
in the same order that you need it, it may make sense to use
the index.  If it is scattered throughout the table, it probably
doesn't make sense.

Note: assuming 8k blocks here

There are easy ways to determine which is best.  Use a 10046
level 8 trace and add up the IO for a query with the index, and
for one without.

This is getting a bit far afield for this list, so I'll stop there.

The point is, don't assume that a Full Table Scan is a bad thing.

I don't think it was mentioned directly in this thread, but it did
seem to be implied.

Jared



On Sat, 2003-12-13 at 12:34, Rozengurtel, Daniel wrote:
> David,Michael,Andy, Sam,Tom and Kristian - thanx very,very much for your
> incisive and knowledgeable answers/suggestions. I tried and learned A LOT
> from these. 
> Here are some of my discoveries and things I've done.
> 
> 1. I am using a CBO type of Database. TRADE_STATUS does not have an index at
> all. Column INST_MNEM has index GOVT_MONDAY_INDX_2. I would expect this
> column to hold about 150 distinct values (from which I'm interested only in
> one:'GRDCOMP'), whereas TRADE_STATUS to have only 4 to 8 distinct values. So
> I think there is a high selectivity ratio on INST_MNEM column with about 90K
> records, and I should be ok. 
> 
> 2.  I played around with RowCacheSize along with analyzing table and
> indexes:
>   - Before analyzed index/table:
>       With no RowCacheSize set up the extract took 34 secs, using about
> 5mb RAM
>       With RowCacheSize =>0 (automatically select what's best), the
> extract took 33 secs, with 5mb RAM 
>       With RowCacheSize =>100, the extract took 8 secs, using about 6mb
> RAM (THE BEST RESULTS EVER)
>   - After analyzed index/table:
>       No change
>       No change
>       No change
> 
> 3.  Although its a cute idea to create a function based index on SUBSTR
> etc... I did not like it a lot cause based on clients requests for
> formatting data, I would need to use a lot of those functions
> (DECODE,INSTR,SUBSTR,NVL etc...). Creating all these indexes would have a
> huge impact on my initial load, when the table is created every day(its
> created by first truncation/inserts/updates)
> 
> 4. I tried to see whether the SUBSTR could be replaced with Perl formatting
> combining with both JOIN/MAP functions on 10K records.
>     - using SUBSTR with:
>       'map' extract took 30 sec - good results
>       'join' extract took 33 sec
>     - without using SUBSTR, but utilize Perl Formatting later with:
>       'map' extract took 38 sec - not as good as I hoped
>       'join' extract took 40 sec - not good at all
> 
> >From this test I could see that there is no BIG difference in join/map,
> though 'join' is cuter for not having extra '|' at the end. But it really
> DID NOT make any difference when I tried to avoid of use of SUBSTR from sql.
> 
> 
> 5. I took out the print statements (print OUTFILE join.....) from while
> ($sth->fetch) loop to see if code spends most of its time writing data, and
> let just loop over the datasets gotten from DB. No impact at all. Prints
> don't save me much time. In fact it don't matter at all. So that rules out
> the possibility of writes delaying the extract.
> 
> 6. SPOOL command was really good. To get those 10K records (not formatted
> text) took me about 7-8 seconds. Which shows the writes to Unix file is not
> an issue.
> 
> 7. The database is on the same physical Unix machine that my extractions
> run. So I am not sure if there is network delay of of some sort.       
> 
> 8. Also compared using bind_columns on a regular array instead of hash with:
>       RowCacheSize=>100, there was no difference at all 
>       RowCacheSize not set at all, the differences were almost
> insignificant. 'Array' way was couple of seconds faster as    expected.
> 
> So, ladies and gets, everything comes down to using RowCacheSize attribute.
> I don't have any experience with it. I know if you have a huge number set up
> to it, your other processes might end up with no Memory for them, or
> something like that. How dangerous is it to use? How do you know what
> optimal RowCacheSize you need for specific code? If this is a bottleneck in
> my case, and RowCacheSize helps me solve that issue, what does it mean the
> problem is? Low cache size Database, Unix???? If I use RowCacheSize=>0, does
> it mean Oracle/Unix has to find a safest and not necessarily best way of
> dealing with data being extracted?? Why then SQL SPOOL command works faster
> than that? 
> 
> The more I know, the more I know that I dont know a lot. :-) Would you help
> me consider these questions? 
> Again, Thanx A LOT to all of you who helped me understand better what I need
> to do.
> 
> Best regards,
> 
> Daniel
> 
> -----Original Message-----
> From: David N Murray [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 12, 2003 12:23 AM
> To: Rozengurtel, Daniel
> Cc: '[EMAIL PROTECTED]'
> Subject: Re: Optimization for faster select...
> 
> 
> Sounds like a great test!  Why don't you let us know which way is faster:
> doing the substr's on the DB or in perl (my bet is perl).  For that
> matter, you could also test to see if join('|', @row) is faster.  You
> could certainly move the "\n" up to the first print line (via . or ,) and
> cut out a whole perl statement.  But then, there alway more than one way
> to do it ;-)
> 
> I don't normally use the bind* calls.  I typically just use fetchrow_array
> or fetchrow_arrayref for large result sets.  But, I just read page 127 of
> our favorite book and your way is probably faster than mine.
> 
> For testing purposes, you could always insert a cut-off (say 10K rows) and
> time it on a smaller scale.  That way each test won't take 35min.  You
> could also take the print out of the loop and just see how long it takes
> to pull the data from the DB.  I imagine that's where your spending most
> of your time.  Maybe tuning the (Oracle?) client interface is the place to
> look.  As Michael points out, EXPLAIN PLAN might be beneficial on your
> query.
> 
> Finally, would performance be improved if you used bind_columns on a
> regular array instead of a hash? (see join(), above.)  It would save the
> overhead of computing the index (hash).
> 
> Just my $0.02.
> Dave
> 
> On Dec 11, Rozengurtel, Daniel scribed:
> 
> > Hello All,
> >
> > I am trying to optimize my code to work faster in selecting about 30
> columns
> > from a denormolized table. The result set of 165,000 records is put to a
> > file on Unix in about 35-40 minutes. I have tried to follow the guide
> lines
> > from Tim's recent presentation on DBI (DBI_AdvancedTalk_200307.ppt) to
> > achieve fast results. The code is working absolutely fine utilizing an
> index
> > on that table (INST_MNEM), however I was wondering if anyone can suggest a
> > faster and better approach to do the same thing. I know selecting 30
> columns
> > can and will affect performance but still....
> > Does anyone know if a specific function in select stmt affects the
> > performance and how? (i.e. SUBSTR, INSTR) Is it better to parse it in
> Perl?
> >
> > Thanx much for your help,
> >
> > Regards,
> >
> > Daniel
> >
> > $pfd_sql="SELECT
> > CUSIP, ISIN, SUBSTR(FXTICKER, 1, INSTR(FXTICKER, ' ', 1) -1) AS TICKER,
> > SECURITY_DESC, ' ' AS IS_DOLLAR_PFD,
> > ...
> > ...
> > ...
> > FROM GOVT_TABLE WHERE INST_MNEM='GRDCOMP' AND TRADE_STATUS='ACTV' ";
> >
> > my  %row;
> > # first get all the Preferreds into a file
> > my  $sth = $dbh->prepare_cached($pfd_sql) || die $dbh->errstr;
> >     $sth->execute;  $sth->bind_columns( \( @[EMAIL PROTECTED] ));
> >
> > while($sth->fetch) {
> >     print OUTFILE map "$row{$_}|", @ClnFldsArray; # print each row with
> > | as delimiter
> >     print OUTFILE "\n";
> > }#while
> >
> >
> > ____________________________________________________________________
> >
> > IMPORTANT NOTICES:
> >           This message is intended only for the addressee. Please notify
> the
> > sender by e-mail if you are not the intended recipient. If you are not the
> > intended recipient, you may not copy, disclose, or distribute this message
> > or its contents to any other person and any such actions may be unlawful.
> >
> >          Banc of America Securities LLC("BAS") does not accept time
> > sensitive, action-oriented messages or transaction orders, including
> orders
> > to purchase or sell securities, via e-mail.
> >
> >          BAS reserves the right to monitor and review the content of all
> > messages sent to or from this e-mail address. Messages sent to or from
> this
> > e-mail address may be stored on the BAS e-mail system.
> > __________________________________________________________________
> >
>

RE: Optimization for faster select...

Reply via email to