Shawn,

Thanks for helping on this. I really appreciate it.

No problem!!

Please post the structures of your "big_table" and your "url_table"
(whatever you called them)  and I will help you to rewrite step 4 to count
how many times a URL appears in the "big_table".

mysql> desc internet_usage; +--------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------+------+-----+---------+-------+ | uid | varchar(10) | YES | MUL | NULL | | | time | datetime | YES | | NULL | | | ip | varchar(20) | YES | | NULL | | | action | varchar(20) | YES | | NULL | | | urlid | int(11) | YES | | NULL | | | size | int(11) | YES | | NULL | | +--------+-------------+------+-----+---------+-------+ 6 rows in set (0.03 sec)

mysql> desc url_table;
+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra    |
+----------+--------------+------+-----+---------+----------------+
| urlid    | mediumint(9) |      | PRI | NULL    | auto_increment |
| url      | text         | YES  | MUL | NULL    |                |
| hits     | mediumint(9) | YES  |     | NULL    |                |
| category | varchar(50)  | YES  |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)

mysql>


Is this bulk import a process you need to do repeatedly? if so we need to worry about updating the count column on the next batch import and not just re-creating it from the new data. That will change the query significantly.

Yes it will be a repeated process. Actually I have a backlog of 6 months data to load!


Here's an example of what the data looks like:

mysql> select * from internet_usage limit 5;
+---------+---------------------+----------------+--------------+-------+------+
| uid | time | ip | action | urlid | size |
+---------+---------------------+----------------+--------------+-------+------+
| n58396 | 2004-06-07 21:12:16 | 21.38.25.204 | TCP_TUNNELED | 5999 | 5297 |
| u344584 | 2004-06-07 21:07:12 | 21.33.136.74 | TCP_HIT | 4494 | 438 |
| - | 2004-06-07 21:07:02 | 21.38.92.76 | TCP_NC_MISS | 2134 | 771 |
| u524797 | 2004-06-07 21:03:27 | 21.32.25.41 | TCP_NC_MISS | 260 | 582 |
| - | 2004-06-07 21:09:13 | 21.201.130.240 | TCP_HIT | 3112 | 542 |
+---------+---------------------+----------------+--------------+-------+------+


mysql> select * from url_table limit 5;
+-------+-----------------------------------------------------------------------------------+------+---------------+
| urlid | url | hits | category |
+-------+-----------------------------------------------------------------------------------+------+---------------+
| 1 | http://www.bbc.co.uk/horoscopes/chinesehoroscopes/images/hp-snake.gif | NULL | Entertainment |
| 2 | http://www.call18866.co.uk/images/logo.jpg | NULL | none |
| 3 | http://m2.doubleclick.net/866421/0409_santoku_250.gif | NULL | none |
| 4 | http://lysto1-dc02.ww.ad.ba.com/ | NULL | Travel |
| 5 | http://www.aboutscotland.com/edin/newstreetph/sitview.jpg | NULL | Travel |
+-------+-----------------------------------------------------------------------------------+------+---------------+
5 rows in set (0.00 sec)



One other problem I'm having here is making the rows in internet_usage unique. At the moment I have lots of duplicates, and I was trying to create a temporary table but unfortunately got an error 27 (I think this refers to a 2GB limit).


mysql> CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM internet_usage;
ERROR 1030: Got error 27 from table handler


Is there another way of doing this?


Yours, Shawn Green Database Administrator Unimin Corporation - Spruce Pine

>
>Mos forgot to populate the url_id column in your user table. I would use
>his same process but re-arrange it like this:
>
>1) create table BIG_TABLE....
>2) load data infile....
>3) create table URL_TABLE (
>       url_id bigint not null auto_increment,
>       url varchar(25) not null primary key,
>       category ....
>       )
>4) INSERT IGNORE URL_TABLE (url, category)
>       SELECT url,category
>       FROM BIG_TABLE
>4) create table USER_TABLE (
>       user_id varchar?(...) primary key,
>       date_time datetime,
>       url_id bigint,
>       size int
>       )
>
>5) insert USER_TABLE
>       SELECT bt.userID, bt.datetime, u.url_id, bt.size
>       FROM BIG_TABLE bt
>       INNER JOIN URL_TABLE u
>       ON u.url = bt.url
>
>doing it this way lets the SQL engine handle the conversion of URLs to
>their IDs in the USER_TABLE...see? No scripting required at all!
>
>Yours,
>Shawn Green
>Database Administrator
>Unimin Corporation - Spruce Pine
>
>

Shawn,

Many thanks for your reply. It sounds like this might be quicker than my
perl script which parses a 1GB log file and fills the database in 40 mins
at
the mo. (The actual parsing only takes 1.5 mins).
There's one snag which I forgot about and that's in the url_table I have
another column called hits which is the number of hits for each url.  I'm
terrible at SQL and wondered if you might be able to suggest a way of doing

this with the method above?

Thanks,

JS.

_________________________________________________________________
It's fast, it's easy and it's free. Get MSN Messenger today!
http://www.msn.co.uk/messenger







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]


_________________________________________________________________
Want to block unwanted pop-ups? Download the free MSN Toolbar now! http://toolbar.msn.co.uk/



-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to