Re: Need Advice on Indexes

Sai Pullabhotla Fri, 26 Dec 2008 05:14:19 -0800

Thanks for letting me know how the indexes work. I will play with it
some more by changing the order of indexes/changing the select
statements and see what the optimizer does.


Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861
www.jMethods.com




On Thu, Dec 25, 2008 at 6:32 PM,  <[email protected]> wrote:
>
> Sigh.
>
>
> Sai,
>
> You post a question to this mailing list looking for help.
> The help you are getting is free so take it with a grain of salt.
> But also consider that the people who are taking the time to help you are
> seasoned database professionals who have a lot of expertise.
>
> You are trying to rationalize a bad design. We know that it is a bad design
> because you're here asking for help because your application does not
> perform well.
>
> I would suggest that you consider what is being said and then taking it back
> to your team so you can fix your design.
>
> Not having indexes is bad. Fixing this doesn't fix your poor design. It will
> improve some performance issues, but you'll not get the most from your
> database.
>
> I understand that you're trying to store your log data in to a database so
> that you can take advantage of what a database does best.
>
> But if you're going to take the time and effort to do this, why don't you do
> it right from the start and do the little extra work to make your life
> simpler in the long run.
>
> There's an expression... '... there is more than one way to skin a cat ...'.
> This is true when it comes to solving your problems in software.
>
> If you add a column to your user table a emp_status char(1), you can store
> values like 'A' for active, or 'N' for non active. Or if your database
> supports binary data types, True or False for active or non-active.
>
> You can then have your program get the correct user id based on username and
> emp_status. You then use this information when you insert the record in to
> your job table. This will 'shrink' the width of the columns. (I'm not going
> to suggest alternatives like triggers and stored procedures but they also
> work well...)
>
> You can then do the same for your project table.
>
> This should clean up your table design.
>
> Getting in to your questions about indexes...
>
> First, lets take a look at your query:
>> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> run_user, queue_priority, run_priority, submit_time, start_time,
>> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
>>
>
> I rewrote it in the following format.
>
> SELECT  job_id, proj_id, proj_name, proj_folder,
>        submit_user, run_user, queue_priority,
>        run_priority, submit_time, start_time, end_time,
>        time_taken, job_log, status
> FROM    dpa_job
> WHERE   start_time>=?
> AND     start_time<?
> AND     status IN(?, ?, ?, ?)
> ORDER BY start_time desc
>
> The reason you want to do this is that you now have visual cues to look at
> your query and see what is going on.  It doesn't take much more time and if
> you make this a habit, you'll get even faster at spotting problems.
>
> The SELECT clause shows you the columns that you're displaying.
> The FROM clause shows you where the data is coming from.
> The WHERE clause indicate what filters are being used to limit the data.
> (Its really the WHERE, AND, and OR statements.)
> This is where you need to focus your attention when you want to create
> indexes that your table will use.
>
> In this query, you are filtering on only two columns. So if you want to
> improve the query's performance, you want an index that is ordered on one or
> both of these columns. Since start_time is more unique than status, you'll
> want the index to order first on start_time, and then on status. You may
> want to not use status in the compound index unless you really have a lot of
> different status types. (It looks like you only have a handful of status
> types.
>
> If you have other queries, you'd want to perform the same exercise and see
> how the queries filter the datasets from each table and then see if you
> already have an index that could be used. If not, then you'd want to add an
> index.
>
> An example...
> Suppose you want to find all of the jobs submitted for a given project by a
> given user.
>
> Your WHERE clause would look like:
> WHERE  proj_id = ?
> AND    submit_user = ?
>
> If you want to limit the time stamp range you'd add the following:
> AND    start_time >=?
> AND    start_time <=?
>
> You could then index on proj_id, submit_user, and possibly start_time in
> that order.
>
> (Since you may have an index on start_time, you may want to check to see
> which index the optimizer is going to use. It is possible that the optimizer
> uses the index on start_time and then filters the data set by the proj_id
> and submit_user. This is going to be database dependent.)
>
> Since you have a database and data, you can use ij to create and drop
> indexes and test the queries to see how they perform.
>
> I would suggest that you look at a good relational database design book to
> see how to improve performance and how relational databases work.
>
> Way back when, I used C J Date's book in school.
> The text I used is way out of print, but here's a list of CJ Date's books
> that can be found on Amazon:
> http://www.amazon.com/s/qid=1230251379/ref=sr_pg_2?ie=UTF8&rs=1000&sort=date
> rank&keywords=Relational%20Database&rh=i%3Astripbooks%2Ck%3ARelational%20Dat
> abase%2Cp_27%3AC.%20J.%20Date&page=1
>
> This may be a good start:
> http://www.amazon.com/Database-Depth-Relational-Theory-Practitioners/dp/0596
> 100124/ref=sr_1_6?ie=UTF8&s=books&qid=1230251480&sr=1-6
>
> HTH
>
> -Mike
>
>> -----Original Message-----
>> From: Sai Pullabhotla [mailto:[email protected]]
>> Sent: Thursday, December 25, 2008 11:06 AM
>> To: Derby Discussion
>> Subject: Re: Need Advice on Indexes
>>
>> Mike,
>>
>> We went through all of these during the design phase and decided that
>> we do not have to do the kind of normalization you have mentioned.
>> Below are a couple of reasons:
>>
>> As stated before, my jobs table is nothing more than a plain LOG file
>> which is stored in a database table instead of a regular file on the
>> file system. The only reason why we needed to store it in the database
>> is for easy search/filtering of records. If I were to use a plain log
>> file, I don't think I would want to print a user ID (a number) to a
>> log file which does not make much sense to the users reading it. I
>> guess the same holds true regardless of the underlying data store
>> (plain file or database table or something else).
>>
>> About not really deleting a user record and having a "status" field in
>> the user table, we did go through this and here are the issues with
>> this approach:
>> Let us say my table definition is -
>> CREATE TABLE user(
>>     user_id integer primary key,
>>     user_name char(20),
>>     full_name varchar(50));
>>
>> We do want to have a unique constraint on the user_name field. If we
>> do not purge the user record when a delete is requested, then I cannot
>> have a new user with the same name as a user that was deleted 2 years
>> ago. Same is true even if you create a composite unique key with
>> user_name and status fields. Correct me if I'm wrong.
>>
>> You said that -
>>
>> > Based on this query, your index would never be used.
>>
>> Why not and how do I make Derby to use my index in various scenarios
>> I've mentioned.
>>
>> Thanks.
>> Sai Pullabhotla
>> Phone: (402) 408-5753
>> Fax: (402) 408-6861
>> www.jMethods.com
>>
>>
>>
>>
>> On Wed, Dec 24, 2008 at 4:10 PM,  <[email protected]> wrote:
>> > Sai, please see my comments in yours...
>> >
>> >> -----Original Message-----
>> >> From: Sai Pullabhotla [mailto:[email protected]]
>> >> Sent: Wednesday, December 24, 2008 1:54 PM
>> >> To: Derby Discussion; [email protected]
>> >> Subject: Re: Need Advice on Indexes
>> >>
>> >> Rick and Mike,
>> >>
>> >> Thanks for taking time to reply to my question.
>> >>
>> >> As far as normalizing the table that Mike suggested -
>> >>
>> >> this jobs table is nothing more than a history table. So, a project or
>> >> user exists today but may not exist tomorrow. However, we want to
>> >> maintain the history of every project that was executed (this is
>> >> considered a job and hence the name). Since, a user may not exist in a
>> >> future time, I can not maintain a foreign key to my user table even
>> >> though I've a user table and an ID column in it. Same with projects. I
>> >> hope it makes sense now why this table is a flat table. However, I
>> >> think I can move the remarks column into its own table and use a
>> >> longvarchar or clob field.
>> >>
>> > Ok,
>> > That's not a good design. Your user information should be stored
>> separately
>> > and have an id associated with it. You don't want to purge this
>> information;
>> > however you can associate a status field as part of this table. So you
>> > retain and don't reuse the ids and you can still keep historical data
>> > intact.
>> >
>> > One of the problems you may have is how you populate the field with the
>> > employee's name. What happens if you have two employees with the same
>> name?
>> > (Yes it happens. ;-)
>> > The other problem is what happens when the person entering the name
>> > misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)
>> >
>> > In addition, your table isn't normalized when you clearly have a 1 to
>> many
>> > relationship of data. Conceptually there are multiple jobs to a project.
>> So
>> > when you store jobs and projects in the same data, its not normalized.
>> >
>> > Note: There are times when you break from using 3rd normal form, however
>> it
>> > requires a lot of practical experience to justify it.
>> >
>> >
>> >> Now, as far as the queries go, Most of the times, it is -
>> >>
>> >> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> >> run_user, queue_priority, run_priority, submit_time, start_time,
>> >> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> >> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
>> >>
>> > Ok,
>> > Not to nitpick, but I'm a bit old school and when writing a query I tend
>> to
>> > want to see it formatted to be easier to read. Taking your query...
>> >
>> > SELECT  job_id, proj_id, proj_name, proj_folder,
>> >        submit_user, run_user, queue_priority,
>> >        run_priority, submit_time, start_time, end_time,
>> >        time_taken, job_log, status
>> > FROM    dpa_job
>> > WHERE   start_time>=?
>> > AND     start_time<?
>> > AND     status IN(?, ?, ?, ?)
>> > ORDER BY start_time desc
>> >
>> > It's a bit easier to read. The key words are capitalized and you can see
>> > your queries.
>> >
>> > Now looking at your common query example, the only thing you need to
>> index
>> > on is the start time column. You can add status to the query, however
>> its
>> > not going to really impact the performance of the query... unless you
>> really
>> > have a lot of rows. (Like more than 10,000 rows in the result set if you
>> > ignore the status filter.)
>> >
>> > Does the IN clause effect the performance? There was an issue earlier in
>> > Derby, but I think it has been fixed for a couple of versions. In your
>> > example query with only 4 values, it is not that big of an issue.
>> >
>> > Based on this query, your index would never be used.
>> >
>> > You've indicated that this is a production database with customers using
>> it.
>> > I'm going to strongly suggest that your company hire a competent DBA.
>> You've
>> > said a couple of things that are major red flags.  When I say DBA, I
>> mean
>> > someone who could be considered a logical DBA and has both data modeling
>> and
>> > app development expertise. If not a FTE, get a consultant in to do a
>> health
>> > check.
>> >
>> >
>> >
>> >> Sai Pullabhotla
>> >> Phone: (402) 408-5753
>> >> Fax: (402) 408-6861
>> >> www.jMethods.com
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>
>
>

Re: Need Advice on Indexes

Reply via email to