Hi!

On Mon, Dec 1, 2008 at 1:03 AM, Brian Aker <[EMAIL PROTECTED]> wrote:
>>> 1) Does data loading need to be JOINABLE?
>>
>> In the scenario I am thinking of, yes. The scenario is scanning a
>> large stream of rows, and using values from the stream to look up keys
>> in a number of relatively small, well indexed tables. (If it helps at
>> all - the idea is to have something to conveniently load a star schema
>> directly from a CSV file. Each row from the CSV file would ultimately
>> result in one row in a fact table, but instead of inserting the values
>> from the CSV file, we'd be inserting integer key values that we looked
>> up in dimension tables).
>
> You have lost me here :)

Sorry. What I mean is, suppose I have a large file, lets assume an
apache access log.

Now, for analysis, I want to load this into a star schema. The fact
table would be called something like 'fact_visit' and it would have
only one column to store a metric (bytes_served). Apart from that,
this table would have many integer keys that point to dimension tables
(request date dimension, request time dimension, response dimension,
remote host dimension, request resource dimension) which can be used
by analysis tools to break down the visits by all kinds of aspects.
Now, assuming the dimension tables are up to date, a convenient way to
load this fact table would be to join the logfile rows to all
dimension tables to look up their keys.

Hope that clears it up.

>> Well it would nice to be able to dump a large set to CSV by writing it
>> to a CSV table. But update as in, modify an existing row - that is not
>> what I would be holding my breath for.
>
> UPDATE and DELETE on rows are what make the CSV engine overly complicated.

Right, got you. Well, I could live without UPDATE and DELETE for CSV.

> Keep in mind, CSV was done just as an example. I've been surprised at how
> many folks have turned out to find it useful (but then people find new uses
> for blackhole all the time as well).

Well, I guess a formal point where it stopped being an example the
moment it was used for general and slow log.

kind regards,

Roland

>> Well, IMO, there are indeed good tools external to the database to do
>> all this stuff. Exchanging data through a storage engine in this
>> manner would merely be a convenience feature.
>
> Unless it was for performance? Loading data is something that is a low
> hanging fruit for us to do. Though in the end it is very dependent on the
> engine.



>
> Cheers,
>        -Brian
>
>>
>>
>> Kind regards,
>>
>> Roland
>>
>>>
>>> Cheers,
>>>      -Brian
>>>
>>> --
>>> _______________________________________________________
>>> Brian "Krow" Aker, brian at tangent.org
>>> Seattle, Washington
>>> http://krow.net/                     <-- Me
>>> http://tangent.org/                <-- Software
>>> _______________________________________________________
>>> You can't grep a dead tree.
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Roland Bouman
>> http://rpbouman.blogspot.com/
>
> --
> _______________________________________________________
> Brian "Krow" Aker, brian at tangent.org
> Seattle, Washington
> http://krow.net/                     <-- Me
> http://tangent.org/                <-- Software
> _______________________________________________________
> You can't grep a dead tree.
>
>
>
>



-- 
Roland Bouman
http://rpbouman.blogspot.com/

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to