Re: [Pytables-users] advice on data modelling

Anthony Scopatz Sat, 03 Mar 2012 12:58:38 -0800

Hi James,

It seems that many of your questions if you are approaching
this from a SQL perspective may be answered on out Hints
for SQL users page (http://www.pytables.org/moin/HintsForSQLUsers).


(Yes, we need to migrate this to sphinx.)

But in general I would agree with your assessment.  (1) would be
what I do first, and depending on query performance, I may index
it.  (2) would also be OK, but uses more space and would not but
if you knew *exactly* the path to the data before hand, might be
faster.  (3) is not a strategy I normally use unless the data really
requires it for some reason.

I hope this points you in the right direction!

Be Well
Anthony
On Sat, Mar 3, 2012 at 7:28 AM, James Casbon <cas...@gmail.com> wrote:

> Hi,
>
> I'm new to pytables and I'm having trouble working out how to model
> data relationships.  Now, I realise that pytables is not a relational
> database, but there are a few ways to go and I want some advice on
> which is best.  I've checked the FAQ and the archives, but if this is
> answered before apologies and point me there.
>
> Imagine I am migrating from a SQL database with two tables, Customer
> and Order (this is not my problem, but it is easy to think about it
> this way).  Customers have many orders.
>
> There are three ways I can see to model this:
>
> 1. have a single denormalised PyTable where the customer data is
> repeated (effectively the table is the join of the two sql tables)
>
> 2. have many smaller tables in a hierarchy of customer/order
>
> 3. create two tables with a customer id (just port the sql tables to
> pytables)
>
> As a novice, the hierarchical focus of PyTables makes me think (2) is
> the way to go, but this has a high metadata cost (I think) bloating
> the file size.  Also, can I get a unified view of the individual
> customer tables when I want to query orders - ie a view of */order?
>
> So those problems indicate that (1) might be better.  However, I will
> then have a lot of repeated data.  Querying is easiest, but the harder
> operation here is changing a customer: many rows need updating.
>
> (3) seems the worst option.  If I want to get the orders from a set of
> related customers (all customers in a country), I have to do a query
> which looks for a customer id in a list of customer ids.
>
> Thanks,
> --
> James
> http://casbon.me/
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] advice on data modelling

Reply via email to