Re[3]: More Innobase questions (Was: RE: Innobase)

Peter Zaitsev Mon, 26 Feb 2001 07:59:18 -0800
Hello Heikki,

Monday, February 26, 2001, 5:08:52 PM, you wrote:

HT> Hi!

HT> There were more questions about Innobase:

>>Here is the comming question - can you speak some more about apace
>>allocation consepts used in innobase. For example how would it like to
>>work with huge number (10.000+ of the tables) and how space allocation

HT> The data dictionary is hashed, a huge number of tables is no problem.
HT> Space allocation is done for small tables one database page at a time
HT> (default 16 kB). For big tables space is allocated in chunks of
HT> consecutive pages, up to 256 pages big.
Good. This sounds it should reduce fragmentation.

Anyway as I look at the benchmarks currently table creation test runs
about 5 times faster on MYISAM, but well about 5 times faster then
BDB. May be the speed difference is because of interface or
transaction handling...

>>is done - I know innobase supports clusterisation, but if I'm not

HT> Yes, all tables in Innobase are clustered by their primary key, that is,
HT> rows are physically in the order of their primary key. If you do not
HT> specify a primary key, then a generated row id is internally used as
HT> the primary key: the rows will be ordered according to their insertion
HT> sequence. The support for the Oracle-like clustering, physically mixing
HT> several tables, is built in but I do not have a user interface for it.
Hm. How does this connects with one page is in use by one table (or
I'm mistaken)

>>mistaken only for rows from one table. The next thing is is there a
>>difference made in handling dynamic and static rows and how
>>fragmentation is avoided for last ones ?

HT> You probably mean what happens if the size of a row changes?
HT> Innobase uses no overflow pages. If a row is updated so that it
HT> cannot fit on the page, we split the B-tree page to get more space.
HT> This will of course slow down table scans. Recreating tables where
HT> this happens may be sensible from time to time.
The better way would be to have a possibility of online optimization.
If you have online backup feature this should also be possible - the
idea is almost the same.

Other thing is - probably you hold some of space free on each page to
have a possibility to insert new rows in the same page ? Other
question is what you do with holes then for example row size changes
from small to huger and you've copied it to other location ?  Wat's
happening if you would have place on the page for a row but it's
fragmented ? Do you somehow defragment the page ?


>>As far as I know some space allocation problems are qute different in
>>file systems and in databases, so I'm surprising you're speaking about
>>similar algorithm.

HT> Yes, in file systems you cannot grow a file in the middle, only at the
HT> end.

This is not only the difference :) For example filesystems usually
work with big amount of files while databases has limit number of
them, Database does not have directories. Database have different
object types (table data, indexes) which have a bit different access
methods.

>>Other thing is - does innobase supports hash indexes internally ?

HT> Yes, it automatically creates hash indexes, no user configuration is
HT> needed.

Hm. What do you meen by this - it somehow tracks the queries and
creates indexes of needed type automatically ? This is good feature
but someone will anyway want to turn it of as want to deside himself
the index layout.

I'm speaking about user level hash indexes - for example if I'm using
index on 200 by string for reference only, and do not do range scans
on it I can save much space and  get much space using hash indexes,
there in other case I would want btree one.

>>Other thing - can I not add files but increase it size ?   How
>>innobase will handle if I'll do so ?

HT> Currently, you cannot increase a size of a data file, but you have
HT> to add a new. Increasing the size of the last data file would of course
HT> be technically easy to implement.

Yes.  That's not the problem. The question is what innobase will do in
this case - take the actual length or panic ?

>>Few words about creation. As I looked at innobase files they are
>>mostly zero filled while created. I think it would be nice to add a
>>possibility to create files not by write() but by lseek() this will
>>make creation faster and files taking less space on file system which
>>support holes in files - last one is good as you can just create huge
>>files in the very begining and do not care much about wasting thespace.

HT> The problem is that the OS may fragment the files if they are not
HT> physically allocated.

Yes. Of course - but some people may not need real speed from innobase
they just want transactions with mysql and does not want to bother
with space much.
So I think this may be good as an option for some users. Not really
important one of course. It's important also for debugging purposes -
currently It's quite annoing to wait while 512Mb database file
required for test is creating....

>>>D) can you safely add new files when there is data in them already?
HT>> Yes you can, but you have to shut down the database first.
>>Shurely special command may be added for this later :)

HT> Yes, in Oracle you can add a data file while the database is up
HT> (ALTER TABLESPACE ADD DATAFILE), and it is easily implementable.

Here we come to one problem again - you can't do this as you have no
place to store the information - You can't just change datafile.

>>Other thing is - don't you think it's not quite good idea to store
>>database configuration in config file. For now it's quite simple and
>>can be easyly recovered by looking at the files anf their sizes but
>>then you will have more complex configuration (i.e several tablespaces
>>with external mirroring) it will be hard to recover.

HT> You have to keep backups of .cnf files also.

Yes of course. But  I think this is not quite stupid, most of the
databases store such data inside (in special table).  Other example is
Linux software raid - new version is moved from storing data in conf
file to storing it in the superblock there the etc file may be just
used to override one.

>>Other question - files sizes. Does innobase itself support 4GB+ files?

HT> The support is built in, but not tested yet.

OK. I'll test it :)

>>>   4) Currently ATIS test fails with innobase table:> Retrieving data
>>> Warning: Query 'select 
>>> city.city_name,state.state_name,city.city_code from state,city 
>>> where city.state_code=state.state_code' returned 1 rows when it 
>>> should have returned 11 rows> Got error:  when executing select 
>>> flight.flight_code,aircraft.aircraft_type from flight,aircraft 
>>> where flight.aircraft_code=aircraft.aircraft_code> got 0 instead of 579 ***> 
HT>> You may have an outdated version of the MySQL source distribution.
HT>> (I am currently exchanging lots of emails with Peter and looking at the
HT>> problem.)
>>I've currently patched a version but still get the same error. Other
>>tests (i.e big tables) now passes.

HT> The Linux version runs the test without errors on my machine. Maybe
HT> the source repository you are using is still somewhat outdated.

OK. May be, other thing is an options. For example BDB didn't work for
me  because of table cache of 4096 instead of 64 :). Do you commit you
changes to work.mysql.com ?

HT>> physically writes its data files full at database creation, so that
HT>> the space is physically occupied.
>>Yes of couse. But I think this should be mentioned in the manual so
>>users not surprised mysqld is doing something several minutes after
>>startup before it's ready.

HT> Will be added to the manual, and to the message the database prints when
HT> a new data file is created.
OK. Thank you :)

>>Do you have plans for external RAID support I.E stripping and
>>mirroring. At leas in row device configuration I thing this may give
>>better perfomance then OS/Hardware RAID.

HT> No, I have no plans for simulating RAID in the database server,
HT> it is done on a lower level, like RAID disk hardware. We have
HT> to measure the speed of hardware RAID systems, to find out if
HT> there is difference to individual disks.

The thing is it's always faster to preform this on a level which knows
more about the configuration.  Good example here is - the RAID
implemented in OS is faster then hardware one because it has more info
to deal with. And database knows even more about data so may get a
gain from multiple disk configuration even more.

HT>> For best performance, you should create big files, so that disk
HT>> seeks are minimized.
HT>> If you lose a file, you must do an archive recovery from a backup,
HT>> using archived log files of MySQL, (not log files of Innobase).
>>This is a problem. Currently mysql does not have a feature for online
>>backup (planed for 4.0) - so to make a real consistent snapshot for my
>>my mysql tables (20GB per server) I need to lock system for about half
>>an hour which is not acceptable in 7/24 mode.

HT> The online backup cabability is built in in Innobase.
HT> You could take an online backup from Innobase by switching the
HT> database to the online backup mode and just physically copying the
HT> database files, and also generated log files, just like in Oracle.
HT> But the problem currently is that the backup is not in sync with
HT> tables stored in non-Innobase formats, like MyISAM.

This is good still. As I remember mysql 4.0 will also have online
backup feature which could probably be synced with mysql.
The worst thing here is - you should copy all data files (gigabytes of
data) there 90%+ of them does not change other a day.  The good way of
course would to have possibility of incremental backup...


>>Other thing is log files - it may take qute long time to process then
>>(additional problems there is only one log file stream) in my case I
>>tries to recover with 3 day logs and it took about 3 days :)

HT> One could sort Innobase archived log files and do a fast recovery in a
HT> single sweep of the disk, but again we have the sync problem with
HT> other table formats.

So You meen you can recover really quickly with innobase logfiles ?


HT>> I will have to write CHECK TABLE (in Oracle called ANALYZE). I am not
HT>> sure if repair table is a good idea, Oracle DBA manual at least does
HT>> not mention such a procedure for Oracle. If the database gets
HT>> corrupted, then the safest method is to recover it from a backup which
HT>> is old enough so that corruption does not occur in the backup. Or recover
HT>> the tables from dumps and MySQL archived log files.
>>That's bad idea to look at the oracle in this place. Look not only at
>>Hi End but in low end also. In my case with mysql most dammages are
>>only affwct indexes so recoverty is possible even without data lost.
>>Other thing is not everyone have complete backup (data + whole
>>logfiles) so I think that's not quite a good idea to skip recovery
>>tool. It's like with filesystems - you usually do backup but I don't
>>think you would like if you will not be able to fix you file system if
>>it dammaged.

HT> How do MySQL indexes get corrupted? does it happen when the
HT> database crashes?
Yes of course :)
HT>  Since in Innobase there is recovery, a database
HT> crash usually does not corrupt data or index files. I will have to look
HT> at other mainstream databases like MS SQL Server and DB2, if they have
HT> repair tools.

Why are you looking at them in this case ?   That's quite upsetting if
you got to recover you backup because the database file was just a bit
dammaged....

HT> If the data in a table is not corrupted, you can try to fix the
HT> indexes by dropping them and recreating them. But I still think this
HT> is somewhat risky.

Yes. This is a common way of repairing a table by just coping it to
temporary table... But still you need not only recover table but also
tablespace, the second is much more important and hard...
I can say for example BerkleyDB has such tool.



HT> But I would not say Oracle is a 'high-end' database, something different
HT> from 'low-end' databases. I mean that all well written relational
HT> databases are technically rather similar. Databases of all sizes
HT> need about the same technical infrastructure in the database server
HT> code.
Well.  The problem is it's quite hard to have one database which is the
safest and fastest in all types of applications.  The simple thing is
- sometimes you do not care about your data (let's say it's produces
table) - so always the fastest way would be to store it in memory.
If you somehow use data from one only application (embedded) you will
also do not need some stuff which also has it overhead.
That's why I like mysql concept there they have much of different
table types and working on embedded version, or BDB concept there
there are a number of different products with different features.
That's why I'm speaking about this things - I know many users WILL NOT
backup their database, because the data is not so critical, as many of
us does not back up their home PC hard disk, but will really want to
recover at least some of data if something bad happens.

By the low-end database I meen the database which is really often used
in productional environment on really low end configurations, there
one why buys Oracle can afford to run it on a better system with
hardware raid and streamer for backups :)




>>The problem here is like in file system you can't just check one file
>>on file system you have to check the whole file system as there could
>>be cross links or bitmap  errors. So You'll probably will have to
>>divide  the physical corect state (allocation and such stuff) which is
>>done on file level and logically correct (i.e index matches all rows),
>>there the last one can be fixed on table level. Making the difference
>>could help as it would allow to recover one tables  while others areonline.
>>Other thing - do you have any flags like MYISAM has - like crashed andso on ?

HT> In Oracle you can recover from logs a single data file, while others are
HT> online. This would help a bit. Do you mean the flags that the MySQL check
HT> procedure sets on files?

Yes. This feature about recovering only one datafile is good but
conflicts with idea about huger files will give more perfomance. Other
thing is on physical disks - can you have several files on single raw
device ?

Mysql has many flags, there most of them of course does not make sence
for innobase as it uses tables not connected with file as and
autorecovers itself. The one which is interesting - is marking and
checking/repearing the table then somehow internal consistency check failed (i.e
wrong page format discovered).


>>>  8) As I understand innobase works with large files self mannaging
>>>  usage in them - so Is it planned to support RAW devices with innobase
HT>> Yes, Innobase will support raw disks. I have to test it still.

>>Good. This should really give much perfomance.-- Best regards,

HT> I am not sure if using raw disks helps a lot. I measured the speed difference
HT> for a disk scan on one machine, and I think it was less than 5 %. But the
HT> other benefit is that the OS cannot fragment a raw disk.

Well. The fragmentation is one issue and the double buffering
(you have your own cache and OS has it's own one - so you just waste
memory. It you limit you OS cache to short you'll have other programs
which may need I/O to work slower on this machine. And the last thing
is - you have better control over the I/O procedure so you can
organize disk accesses better way to minimize the seeks. Of course
here you have much work to do to optimize I/O but this may give good
benifit.






-- 
Best regards,
 Peter                            mailto:[EMAIL PROTECTED]



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re[3]: More Innobase questions (Was: RE: Innobase)

Reply via email to