Re: Slow join... Big table

Mike W. Baranski Thu, 12 Jul 2001 06:34:02 -0700
Don Read wrote:
> 
> On 11-Jul-01 Mike W. Baranski wrote:
> > Awsome answers!  I'll clear a few things up where appropriate
> >
> > Don Read wrote:
> 
> <snip>
> 
> >>
> >>
> >> Why the LEFT JOINS ?
> >>  They're (mostly) used for finding set membership; whenever you see
> >>  'LEFT JOIN b' without s following 'b IS [NOT] NULL' clause 'tis the
> >>  clue that you prolly should be using something else.
> >>
> >> Perhaps STRAIGHT_JOIN, or SELECT DISTINCT ... JOIN
> >> is what you want (or at least give the same result).
> >>
> >
> > Yeah, sorry, I cut the joins out, they're about like you imagined...
> >
> > It needs to be done like this, because even if status/dept/source_host/
> > etc are unknown we need to display the records...
> >
> 
> Ok, understand; one the cases where the generalization doesn't hold.
> 
> > It's a security reporting app, so even if someone with an unknown
> > bid/department scans a badge we need to log it and include it.  That's
> > why the left join, then we just set the default of
> > badge_history_resolved.department to 'Unknown' or whatever, and make it
> > not null.  The logic of the query is correct (I believe) and it's being
> > driven by an import program coming from Informix written in c, so
> > asthetics are not so important...
> >
> >> > The describe statement looks like:
> >> >
> >> > +---------------+--------+---------------+---------+---------+-----+
> >> >| table         | type   | possible_keys | key     | key_len | ref
> >> >| | rows    | Extra |
> >> > +---------------+--------+---------------+---------+---------+-----+
> >> >| badge_history | ALL    | NULL   | NULL | NULL | NULL| 7073329
> >> >| badge_type    | ALL    | PRIMARY| NULL    |    NULL | NULL  | 4 |
> >> >| badge         | ref    | bid    | bid     |       4 | badge_history.bid
> >> >| |1
> >> >| badgests      | eq_ref | PRIMARY| PRIMARY | 68 |
> >> >| badge_history.status,badge_history.source_host | 1
> >> >| area          | eq_ref | PRIMARY| PRIMARY |68 |
> >> >| badge_history.area,badge_history.source_host   |    1 |
> >> >| department    | eq_ref | PRIMARY,id| id  | 4 | badge_history.dept|1 |
> >> > +---------------+--------+---------------+---------+---------+-
> >> >
> >>
> >> How many rows in badge_type ?
> >
> > 4 rows, and I've indexed them every way I can think of (unique on id,
> > unique on (id, source_host) and still can't get it down to an eq_ref,
> > which it should be.  It's a simple table, with just id        description
> > source_host fields and 4 rows or so.
> >
> 
> That should be perfect, and prolly won't hit the disk (get the answer out
> of the index cache).
> Are the id & source_host declaration exactly the same as history ?
> Otherwise there could be some type-cast shuffle going on behind the scenes.
They're the same, int(4) and varchar(64), respectively.  I'm going to
change the varchar(64) to varchar(32), I think...

> 
> >>
> >> Since you are running a full scan on the 7 million rows in badge_history,
> >> that first JOIN is pretty important.
> >> Do you have a key on badge_type (source_host,id) ?
> >
> > yes, unique on id, source host, rather than source_host, id...
> >
> >>
> >> > This join takes over 24 hours to run, and as you can see I'm using
> >> > indexes
> >> > and things.  Does anyone have any suggestions on how to speed this up?
> >> >
> >>
> >> 7,000,000 x foo is going to be a fairly large number.
> >>
> >> Reducing foo :
> >>
> >>   Break out the first JOIN (badge_history,badge_type); save resultset
> >>   in a temp table.
> >>   Complete the final table by selecting on the temp and the other
> >>   tables (which look to be well indexed).
> >>
> >> Reducing 7,000,000 :
> >>
> >>    # put key on the department, if not already indexed
> >>    $qry="ALTER TABLE history_badge ADD KEY idx_d (dept)";
> >>    SQLQuery($qry);
> >>
> >>    foreach $dept (1..25) {           # 25 departments in this example
> >>      $qry="INSERT INTO final_table
> >>         SELECT ...
> >>         FROM history_badge AS hist ... JOIN ... JOIN ...
> >>         WHERE hist.dept=$dept";
> >>       SQLQuery($qry);
> >>    }
> >>
> >> > Is it faster if I create the badge_history_resolved table, set up the
> >> > indexes, and then do the inserts, or should I create the table, do the
> >> > inserts, and then add all of hte indexs at the end?
> >>
> >> Add INDEX afterwards.
> >>
> >
> > Thought so, that's one of the reasons it ran so slowly...
> >
> 
> That'll do it.
Much Faster, exponentially even...
> 
> >> >
> >> > Also, a somewhat related problem, I seem to behaving some trouble with
> >> > the c
> >> > API.  This program ports a large database from Informix, and some of the
> >> > queries (creating indexes mostly) return errors, and it seems tor eally
> >> > botch the mysql thread.  Example:
> >> >
> >> > string query = "alter table badge change bid bid varchar(18) not null,
> >> > add
> >> > unique(bid)"
> >> >
> >>
> >> what is bid before this ?
> >
> > Since this is a port, the types must match as much as possible, the
> > logic of the c program sets the type based on the Informix type coming
> > out...
> >
I basically have logic in the import, using the informix esql/c stuff,
that determines the type of the col in informix, and then finds the best
approximation in mysql.  Therefore, column types that are the same in
Informix should be the same in mysql, i.e. badge.bid.type =
badge_history.bid.type and so forth.  I don't want to change it b/c I
wrote it 6 months ago and it works, and I have no idea how anymore...

> 
> ? 'splain please.
> 
> <snip>
> 
> >>
> >> Example bid(s) please.
> >>
> 
> <snip>
> 
> > No duplicates, I know that for sure, and not just this query, I get it
> > at different places.  Is there a limit with the c api as to how many
> > change xxx yyy zzz, change...., change..., .... you can have with the c
> > api, there appears to be...
> >
> 
Good idea, I'll try that...
I don't get any complaints about string size, and it's using the STL and
GCC 2.96, and some of the statements are very short, so I doubt that...

How do I catch the error code, and where do I map the codes to something
that makes sense?

>  max_packet_size
>    and your C compiler my have some limit on string size ...
> 
> > The thing is, these lines show up in the error log, and I cut and past
> > them into the command line client, and they work, this is why I"m
> > stumped...
> >
> 
> Weird, but withou the error code, might as well check the tarot cards.
> 
> >>
> >> >
> >> > Thirdly, Could someone suggest some memory sizes in teh my.conf file for
> >> > the
> >> > above join?  What are the best settings?
> >> >
> >>
> >> Fix query => tune index(s) => tune buffers.
> >>
> >> > This is a 2 Proc RedHat 7.1 Machine w/ 256 MB of RAM, pretty much
> >> > dedicated
> >> > to this app.
> >>
> >>  ... little light on the RAM, methinks, what does 'top' say ?
> >>       (a cold beverage sez you're deep in the weeds)
I think so too, but I'm of the persuasion that more memory would be much
better than more disk(s), no?

Man, I"m really miles ahead of where I was yesterday!  What a big help.
> >
> > depends on the my.cnf file, I've given it enough memory so that it swaps
> > some of hte stuff it's not using out, but mysql is not swapping.  Also,
> > it only has 1 hdd, how much of a factor is this.
> >
> 
> If IDE :
> might stuff-up your swap some, it'll help to get another controller &
> drive for swap & tmp tho ...
> 
> > Thanks, that was extremely helpful...
> 
> You're welcome.
> --
> Don Read                                       [EMAIL PROTECTED]
> -- It's always darkest before the dawn. So if you are going to
>    steal the neighbor's newspaper, that's the time to do it.

-- 
*********************************
Mike W. Baranski
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Web  : http://www.secmgmt.com
Phone: 919-788-9200
Fax  : 919-510-0037

P.O. box 30099
Raleigh, N.C. USA 27622
*********************************

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Slow join... Big table

Reply via email to