On Tue, Sep 6, 2011 at 11:36 PM, Mike Matrigali <mikem_...@sbcglobal.net> wrote: > The encryption points work at > a page level and the system counts on the number of bytes being the > same. The reason being that we find pages by page number * number of > bytes per page. If you decide to go at it from this level you will > need to implement an underlying filesystem to map the pages. I don't > think this is very interesting as I believe you can get this effect > "for free" on a number of OS's by just picking a compressed filesystem and > put derby on that file system. > > If I were working on this I think I would look at the point where each > datatype that you are trying to compress is read and written to disk. > Start with looking at the various "readExternal*" and "writeExternal*" > routines for each datatype. Start by understanding the current on > disk formats of the datatype and then propose the new on disk formats > for the datatype. > Are there any way to do compression before executing query?For example during parsing the query or query optimisation.Because we want to investigate whether the compression affects query execution speed. > Note the result of this work would not be appropriate for submission > as a complete project would suggest how user would control whether or > not to compress. And Final format should allow the system to tell > the difference between the formats. There are many options here, for > instance we could trace compression at the following levels: > per single column value > per single column in table (ie. metadata indicates column is compressed in > this table) > per all columns in a table (ie. metadata indicates all columns in table > compressed) > per database (ie. metadata in database says all data is compressed). > > The system is not set up well to track per single column value. It would > not be too difficult to track at table level, with the creation > of new internal datatypes that would inherit from each other. ie. > a CompressedSQLChar that inherits from a SQLChar. > > Understanding code in the following directory is a good start: > C:/derby/s1/java/engine/org/apache/derby/iapi/types > > Rick Hillegas wrote: >> >> Hi Reka, >> >> I would recommend looking at the Derby logic for encrypting databases. You >> can probably get column compression to work by putting your (de)compression >> logic alongside the (de)encryption touchpoints. >> >> Hope this helps, >> -Rick >> >> On 9/6/11 9:21 AM, Reka Thirunavukkarasu wrote: >>> >>> Hi Rick, >>> Thank you for your immediate reply.We are trying to achieve attribute >>> level compression >>> (in your words more compact storage of columns).Attribute level >>> compression is best >>> from the query processing point of view.Attributes fall in to three >>> major category Integer, >>> floating point and character string.We have to apply three different >>> compression techniques >>> for each data types.But for demonstration purpose we will apply >>> compression to only character >>> string attributes.We will test it in a database which has only >>> character string.This is our main goal. >>> Thank you. >>> >>> On Tue, Sep 6, 2011 at 8:19 PM, Rick Hillegas<rick.hille...@oracle.com> >>> wrote: >>>> >>>> Hi Reka, >>>> >>>> Can you give us more detail about what you are trying to achieve? That >>>> may >>>> help us figure out what the right touchpoints are. Are you trying to >>>> achieve >>>> any of the following: >>>> >>>> 1) More aggressive garbage-collection of deleted rows... >>>> >>>> 2) More compact storage of columns... >>>> >>>> 3) More compact storage of rows... >>>> >>>> 4) More compact storage of pages... >>>> >>>> 5) Something else... >>>> >>>> Thanks, >>>> -Rick >>>> >>>> >>>> On 9/6/11 7:07 AM, Reka Thirunavukkarasu wrote: >>>>> >>>>> Hi all, >>>>> >>>>> We are from university of Moratuwa,Sri lanka.We are willing to apply >>>>> data compression to Derby in query processing >>>>> as requirement of our Advanced Database course project. >>>>> >>>>> >>>>> Currently Derby has facility to trim the free space in raw data >>>>> container(using SYSCS_UTIL.SYSCS_COMPRESS_TABLE >>>>> system procedure).Our goal is to apply data compression(Run-length >>>>> encoding Compression) for each of values(not field name) >>>>> of a query before executing and decompressing the data >>>>> when the execution finishes. >>>>> >>>>> >>>>> Initially we went through the code base and identified that the data >>>>> compression can be applied within the executeStatement() >>>>> >>>>> method of org.apache.derby.impl.jdbc.EmbedStatement class before >>>>> calling ps.execute(),and we thought Using getParameterValueSet() >>>>> method of Activation class the the attribute values of the parsed >>>>> query can be obtained.But when we try to print the contents of the >>>>> ParameterValueSet for typical insert query >>>>> ,it is printing null(it is just empty set). >>>>> >>>>> >>>>> We are expecting help from community regarding following questions. >>>>> >>>>> >>>>> 1)What is wrong with point we identified to apply compression? >>>>> >>>>> 2)By applying compression before executing query,will the query >>>>> execution process be affected? >>>>> >>>>> 3)Are there any possible place to apply compression and decompression >>>>> before executing query? >>>>> >>>>> >>>>> Thank you. >>>>> -Reka >>>>> >>>> >>> >>> >> >> > >
-- Regards, Reka :)