Re: Data Compression for Query Processing

Reka Thirunavukkarasu Tue, 06 Sep 2011 12:30:08 -0700

On Tue, Sep 6, 2011 at 11:36 PM, Mike Matrigali <mikem_...@sbcglobal.net> wrote:
> The encryption points work at
> a page level and the system counts on the number of bytes being the
> same.  The reason being that we find pages by page number * number of
> bytes per page.  If you decide to go at it from this level you will
> need to implement an underlying filesystem to map the pages.  I don't
> think this is very interesting as I believe you can get this effect
> "for free" on a number of OS's by just picking a compressed filesystem and
> put derby on that file system.
>
> If I were working on this I think I would look at the point where each
> datatype that you are trying to compress is read and written to disk.
> Start with looking at the various "readExternal*" and "writeExternal*"
> routines for each datatype.  Start by understanding the current on
> disk formats of the datatype and then propose the new on disk formats
> for the datatype.
>
Are there any way to do compression before executing query?For example
during parsing the query
or query optimisation.Because we want to investigate whether the
compression affects query execution
speed.
> Note the result of this work would not be appropriate for submission
> as a complete project would suggest how user would control whether or
> not to compress.  And Final format should allow the system to tell
> the difference between the formats.   There are many options here, for
> instance we could trace compression at the following levels:
> per single column value
> per single column in table (ie. metadata indicates column is compressed in
> this table)
> per all columns in a table (ie. metadata indicates all columns in table
> compressed)
> per database (ie. metadata in database says all data is compressed).
>
> The system is not set up well to track per single column value.  It would
> not be too difficult to track at table level, with the creation
> of new internal datatypes that would inherit from each other.  ie.
> a CompressedSQLChar that inherits from a SQLChar.
>
> Understanding code in the following directory is a good start:
> C:/derby/s1/java/engine/org/apache/derby/iapi/types
>
> Rick Hillegas wrote:
>>
>> Hi Reka,
>>
>> I would recommend looking at the Derby logic for encrypting databases. You
>> can probably get column compression to work by putting your (de)compression
>> logic alongside the (de)encryption touchpoints.
>>
>> Hope this helps,
>> -Rick
>>
>> On 9/6/11 9:21 AM, Reka Thirunavukkarasu wrote:
>>>
>>> Hi Rick,
>>> Thank you for your immediate reply.We are trying to achieve attribute
>>> level compression
>>> (in your words more compact storage of columns).Attribute level
>>> compression is best
>>> from the query processing point of view.Attributes fall in to three
>>> major category Integer,
>>> floating point and character string.We have to apply three different
>>> compression techniques
>>> for each data types.But for demonstration purpose we will apply
>>> compression to only character
>>> string attributes.We will test it in a database which has only
>>> character string.This is our main goal.
>>> Thank you.
>>>
>>> On Tue, Sep 6, 2011 at 8:19 PM, Rick Hillegas<rick.hille...@oracle.com>
>>>  wrote:
>>>>
>>>> Hi Reka,
>>>>
>>>> Can you give us more detail about what you are trying to achieve? That
>>>> may
>>>> help us figure out what the right touchpoints are. Are you trying to
>>>> achieve
>>>> any of the following:
>>>>
>>>> 1) More aggressive garbage-collection of deleted rows...
>>>>
>>>> 2) More compact storage of columns...
>>>>
>>>> 3) More compact storage of rows...
>>>>
>>>> 4) More compact storage of pages...
>>>>
>>>> 5) Something else...
>>>>
>>>> Thanks,
>>>> -Rick
>>>>
>>>>
>>>> On 9/6/11 7:07 AM, Reka Thirunavukkarasu wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> We are from university of Moratuwa,Sri lanka.We are willing to apply
>>>>> data compression to Derby in query processing
>>>>> as requirement of our Advanced Database course project.
>>>>>
>>>>>
>>>>> Currently Derby has facility to trim the free space in raw data
>>>>> container(using SYSCS_UTIL.SYSCS_COMPRESS_TABLE
>>>>> system procedure).Our goal is to apply data compression(Run-length
>>>>> encoding Compression) for each of values(not field name)
>>>>> of a query before executing and decompressing the data
>>>>>  when the execution finishes.
>>>>>
>>>>>
>>>>> Initially we went through the code base and identified that the data
>>>>> compression can be applied within the executeStatement()
>>>>>
>>>>> method of org.apache.derby.impl.jdbc.EmbedStatement class before
>>>>> calling ps.execute(),and we thought Using getParameterValueSet()
>>>>> method of Activation class the the attribute values of the parsed
>>>>> query can be obtained.But when we try to print the contents of the
>>>>> ParameterValueSet for typical insert query
>>>>> ,it is printing null(it is just empty set).
>>>>>
>>>>>
>>>>> We are expecting help from community regarding following questions.
>>>>>
>>>>>
>>>>> 1)What is wrong with point we identified to apply compression?
>>>>>
>>>>> 2)By applying compression before executing query,will the query
>>>>> execution process be affected?
>>>>>
>>>>> 3)Are there any possible place to apply compression and decompression
>>>>> before executing query?
>>>>>
>>>>>
>>>>> Thank you.
>>>>> -Reka
>>>>>
>>>>
>>>
>>>
>>
>>
>
>




-- 
Regards,
Reka
:)

Re: Data Compression for Query Processing

Reply via email to