Hey folks,

I was talking to ottomata today about developing a schema for processing
revisions in Hadoop.  We came across a deep problem with field names that
I'd like to discuss because I want people to be aware of the problem.

To explain this, I'll use an example.  Let's say you want to get the
namespace of this page:
https://en.wikipedia.org/wiki/Biology

In javascript, this is represented as the variable *wgNamespaceNumber*.

In the database, this is represented as *page.page_namespace*

In the XML database dump, this is represented as the value at *<page><ns> *or
*<namespaces><namespace.key> *depending where you are.

Right now, ottomata and I are considering the more descriptive name
*page_namespace_id* since the value of all of these valiables/fields is an
identifier -- not a name.   I think that this is a *good* name if we
consider it in a vacuum, but if we choose it, we'll add yet another name
for wiki devs & analysts to be aware of.

Given the context of this decision, my instinct is to choose the least
surprising name.  Since I mostly work with the database, that would mean
I'd choose *page_namespace*.

This is just one example of such nonsense.  The decisions we make in
formats that we produce now can have immeasurable effects on the sanity of
others.  I hope that the decisions we make today will minimize such pain,
but it's hard to know for sure.

-Aaron
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to