Re: [appengine-java] Joins!
Thanks all. This is a lot of great information. I've learned a ton. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/49KEIocr7IcJ. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
Why isn't denormalization a real option? A lot of companies denormalize with great success, including Google. The thing about joins is this: they have to happen at some point in memory. Datastore or local instance. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Thu, Aug 4, 2011 at 6:00 PM, William Levesque billleves...@gmail.comwrote: Alright, so I've spent a lot of time contemplating this whole BigTable isn't relational limitation. I've tried two techniques for joining different tables. The solution described here... http://gae-java-persistence.**blogspot.com/2010/03/** executing-simple-joins-across-**owned.html?showComment=**1298589845909#** c7562859098617623831http://gae-java-persistence.blogspot.com/2010/03/executing-simple-joins-across-owned.html?showComment=1298589845909#c7562859098617623831 and joining with loops inside my code. The former eats a lot of CPU, the latter is just silly. Denormalizing isn't a real option. There are very good reasons normalization was developed. So I'm trying to get a definitive strategy from Google that is considered the best way to support a system with complex data relationships. I appreciate your help. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/UCYoAoRaI6QJ. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
Because if you have denormalized data, then record updates can become enormous. If someones address is denormalized into 1000 contact records, then when the user updates their address the system has to go out to all of the contact records and update them as well. And this gets multiplied by every complex relationship that exists in the data. And redundant fields can increase data size exponentially. Regardless, denormalization is only one option. It just seems that Google should publish the guidelines for how to manage complex data relationships with clear guidance on advantages and disadvantages for each strategy. It's an important architectural consideration and we are currently left to hunt and peck around for what is even available let alone best practice for a given set of system requirements. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/k9TWBll6XqwJ. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
William, Could you explain how the update can be enormous with demoralized table? My understanding is a flat table is easier to update that normalized one. Thanks. On Aug 5, 2011 1:36 PM, Ikai Lan (Google) ika...@google.com wrote: Why isn't denormalization a real option? A lot of companies denormalize with great success, including Google. The thing about joins is this: they have to happen at some point in memory. Datastore or local instance. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Thu, Aug 4, 2011 at 6:00 PM, William Levesque billleves...@gmail.com wrote: Alright, so I've spent a lot of time contemplating this whole BigTable isn't relational limitation. I've tried two techniques for joining different tables. The solution described here... http://gae-java-persistence.**blogspot.com/2010/03/** executing-simple-joins-across-**owned.html?showComment=**1298589845909#** c7562859098617623831 http://gae-java-persistence.blogspot.com/2010/03/executing-simple-joins-across-owned.html?showComment=1298589845909#c7562859098617623831 and joining with loops inside my code. The former eats a lot of CPU, the latter is just silly. Denormalizing isn't a real option. There are very good reasons normalization was developed. So I'm trying to get a definitive strategy from Google that is considered the best way to support a system with complex data relationships. I appreciate your help. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/UCYoAoRaI6QJ. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
I was trying to explain that with... If someones address is denormalized into 1000 contact records, then when the user updates their address the system has to go out to all of the contact records and update them as well. And this gets multiplied by every complex relationship that exists in the data. And redundant fields can increase data size exponentially. But is Google's position that all data should be denormalized? -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/W4INlksPkb0J. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
William, You might want to go over this http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/mapreduce-osdi04.pdf, and come back again with any questions. Ikai and possibly others were trying to convey to you that bigtable approach is more scalable than relational approach. If it works for Google, why woudn't it work for you? Do you have a larger data than Google? On Fri, Aug 5, 2011 at 2:45 PM, William Levesque billleves...@gmail.comwrote: I was trying to explain that with... If someones address is denormalized into 1000 contact records, then when the user updates their address the system has to go out to all of the contact records and update them as well. And this gets multiplied by every complex relationship that exists in the data. And redundant fields can increase data size exponentially. But is Google's position that all data should be denormalized? -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/W4INlksPkb0J. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
I didn't mean to suggest that. Yes, a fanout is potentially bad, but the problem with the normalized approach is that you equally optimize for reads and writes. In the address book example, I update my address book about 1 time every 3 years. I read my address book 20 times a day. I think it's fair to pay the cost of a fan out on change because a change is so infrequent. Normalization has its benefits: it's harder for things to get out of sync, which is ALWAYS a risk with denormalization. A denormalized solution tends to favor eventually consistency approaches over strongly consistent approaches. My point is that every app can be built in a denormalized approach, and in the majority of cases, you actually *want* to build your app in this approach, not the other way around. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Fri, Aug 5, 2011 at 12:00 PM, JT jem...@gmail.com wrote: William, You might want to go over this http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/mapreduce-osdi04.pdf, and come back again with any questions. Ikai and possibly others were trying to convey to you that bigtable approach is more scalable than relational approach. If it works for Google, why woudn't it work for you? Do you have a larger data than Google? On Fri, Aug 5, 2011 at 2:45 PM, William Levesque billleves...@gmail.comwrote: I was trying to explain that with... If someones address is denormalized into 1000 contact records, then when the user updates their address the system has to go out to all of the contact records and update them as well. And this gets multiplied by every complex relationship that exists in the data. And redundant fields can increase data size exponentially. But is Google's position that all data should be denormalized? -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/W4INlksPkb0J. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
As far as managing complex data relationships, I don't think such a set of practices exists. What I can and should do (once I get some time) is publish some case studies about how we have persisted data in some cases. True, denormalizing data often requires you to think a little bit, but that's also why I like it: the school of normalization can often lead to people going down a checklist approach for figuring out persistence schemas which can often produce substandard structures. Here's an example of a recent internal app I built/am building: a trip planner. For each user, I store every trip that user takes in a serialized structure on that user. For each region, I store a serialized list of trips to that region. Whenever someone updates a trip, I have to update several structures, but the assumption is that it will be read-heavy instead of update heavy. The application is heavily denormalized and uses get-by-key as much as possible. This app could also very easily have been built using a normalized approach: trip table joins to regions joins to cities. users join to user_trips join to trips. The way to approach denormalization is to think about what you are trying to achieve, and starting from there, moving backwards to figuring out how to save data. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Fri, Aug 5, 2011 at 12:13 PM, Ikai Lan (Google) ika...@google.comwrote: I didn't mean to suggest that. Yes, a fanout is potentially bad, but the problem with the normalized approach is that you equally optimize for reads and writes. In the address book example, I update my address book about 1 time every 3 years. I read my address book 20 times a day. I think it's fair to pay the cost of a fan out on change because a change is so infrequent. Normalization has its benefits: it's harder for things to get out of sync, which is ALWAYS a risk with denormalization. A denormalized solution tends to favor eventually consistency approaches over strongly consistent approaches. My point is that every app can be built in a denormalized approach, and in the majority of cases, you actually *want* to build your app in this approach, not the other way around. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Fri, Aug 5, 2011 at 12:00 PM, JT jem...@gmail.com wrote: William, You might want to go over this http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/mapreduce-osdi04.pdf, and come back again with any questions. Ikai and possibly others were trying to convey to you that bigtable approach is more scalable than relational approach. If it works for Google, why woudn't it work for you? Do you have a larger data than Google? On Fri, Aug 5, 2011 at 2:45 PM, William Levesque billleves...@gmail.comwrote: I was trying to explain that with... If someones address is denormalized into 1000 contact records, then when the user updates their address the system has to go out to all of the contact records and update them as well. And this gets multiplied by every complex relationship that exists in the data. And redundant fields can increase data size exponentially. But is Google's position that all data should be denormalized? -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine-java/-/W4INlksPkb0J. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
I am not sure what you meant by fan out and fan in but I agree with you that in relational world, data are more consistent as they are stored and enforced by constraints etc. but demoralized form does not requires joins, which makes them more scalable as less overhead. If one high level entity exits in multiple groups, yes it is waste of (google's) storage space and more update are needed but isn't map reduce had proven that it is still less intensive than table joins? Sent from my HTC on the Now Network from Sprint! - Reply message - From: Ikai Lan (Google) ika...@google.com Date: Fri, Aug 5, 2011 3:13 pm Subject: [appengine-java] Joins! To: google-appengine-java@googlegroups.com -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Joins!
On Fri, Aug 5, 2011 at 11:45 AM, William Levesque billleves...@gmail.com wrote: But is Google's position that all data should be denormalized? I don't think anyone would say that. I wrote up my thoughts around this subject here: http://blog.similarity.com/post/7541938593/how-to-build-an-online-dating-site-nosql-edition The upshot is that we've been conditioned by SQL theorists to believe that there is a proper way of modeling data; that the data itself defines the schema and the magic of the RDBMS behind the curtain makes it fast. Unfortunately, this is a lie. It worked to a point but the traffic demands of a mass consumer application have vastly outstripped the RDBMS. You're back to figuring out how to optimize your schema for your particular query profile. So the answer is not denormalize everything, it's denormalize the right things. And the right things will vary from application to application. You just have to build up a correct mental model of how the datastore performs and then design your application accordingly. Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.