Re: Retrieve header names for list feeds.

Jason Cipriani Mon, 04 Apr 2011 10:40:55 -0700

Yes! That would be perfect! =)

Jason



On Mon, Apr 4, 2011 at 1:20 PM, Vic Fryzel <[email protected]> wrote:

> I think ideally what I'd like to do is add indexed header information to
> the top of every lists feed, with both positional information, index names,
> and full names.  Does that sound like it'd work?
>
> -Vic
>
>
>
> On Mon, Apr 4, 2011 at 1:16 PM, Jason Cipriani 
> <[email protected]>wrote:
>
>> Hi Vic,
>>
>> I'm OK with the workaround except it's pretty kludgy. I would be more OK
>> with the workaround if you could guarantee that the following
>> assumptions will remain true in the future:
>>
>>   - List entry items are in left-to-right GUI order.
>>   - List entry always includes all items and never leaves out ones with
>> blank values.
>>   - List entry item names with blank column headers always start with an
>> underscore.
>>   - List entry item names with non-blank column headers never start with
>> an underscore.
>>
>> Can you make those guarantees?
>>
>> I think giving back header information with each list entry is almost less
>> ideal than the workaround just because the list feed already sends so much
>> data over the network (because it repeats the field names for every header).
>> In my Google Forms example, some of the questions are very long, and list
>> feed entry names account for over 80% of the amount of data in the list feed
>> - and I am already having usability issues with my web application because
>> of how long it takes to transfer spreadsheet data.
>>
>> What if you made an option to return column position indexes as keys in
>> the list feed entries *instead* of names? This would let you use the
>> workaround to associate fields with header values without having to rely on
>> any of the assumptions below, and would greatly reduce network usage for
>> list feeds as well. For me, the primary reason I use the list feed at all is
>> because the cell feed takes even more bandwidth (and has no guarantees on
>> the order of returned cells, so I had to piece together the returned data
>> using logic I posted about in another thread here), and list feed was the
>> most efficient way to transfer spreadsheet data that I could find.
>>
>> There currently does not seem to exist any way to efficiently retrieve
>> data from a spreadsheet without getting loads of redundant/irrelevant data
>> in the data stream.
>>
>> Jason
>>
>>
>> On Mon, Apr 4, 2011 at 12:16 PM, Vic Fryzel <[email protected]> wrote:
>>
>>> Again, I'm looking into just giving back header information with each
>>> list entry, which should resolve everyone's issues.  With a valid workaround
>>> of using the cells feed to look up the first row positionally though, this
>>> isn't a huge deal, in my honest opinion.
>>>
>>> -Vic
>>>
>>>
>>>
>>> On Thu, Mar 17, 2011 at 8:22 AM, David Bullock <[email protected]>wrote:
>>>
>>>> Hey Jason, that's a great write-up, and a great illustration of the
>>>> pain of (as you nicely put it) content-dependent keys.
>>>>
>>>> Just dreaming here, but follows might be the definition of a list.
>>>> (Borrowing heavily from the deprecated Table API).
>>>>
>>>>  <gs:data>
>>>>      <gs:header height="1" nativenames="1" />
>>>>      <gs:column idx="1" key="id" name="Item" />
>>>>      <gs:column idx="2" key="descr" name="Description" />
>>>>      <gs:column idx="3" key="qty" name="Quantity" />
>>>>      <gs:column idx="4" key="ea" name="Each" />
>>>>      <gs:column idx="5" key="total" name="Total" c1="=qty * ea" />
>>>>  </gs:data>
>>>>
>>>>
>>>> The 'name' attribute would be the content-dependent key, while the
>>>> 'key' would be the content-independent key, conforming to existing
>>>> list-feed naming constraints.  User changes to the sheet would modify
>>>> 'idx' and 'name', but 'key' would stay put (and the E-Tag and Last-
>>>> Modified of the resource containing this metatdata would change).
>>>> When updating this metadata via the ATOM/REST API, 'key' would be
>>>> required always, and either 'name' or 'idx' would suffice to establish
>>>> the binding (so long as 'name' could be distinctly found in the header
>>>> row when the metadata is POSTed or PUT or else a 4xx error).  If
>>>> *both* 'name' and 'idx' are supplied, the index establishes the
>>>> binding, and the content of the header cell is rewritten.
>>>>
>>>> Where to put this metadata?  How to integrate it with the existing
>>>> list feed?  OK, here's my proposal in baby-steps:
>>>>
>>>> Put this metadata into entries in a separate feed.  Call it the 'List
>>>> Definition' feed to avoid confusion with Tables.  Now, for each
>>>> worksheet in the spreadsheet, there is an automatic List Definition,
>>>> which have the property that their 'key' is always a function of
>>>> 'name' according to the present name-munging rules in the List API.
>>>> These automatic List Definitions can't have their column 'key' values
>>>> set via the API (except as a side-effect of setting 'name') or else
>>>> they give a 4xx HTTP status.
>>>>
>>>> Existing clients of the API access the lists defined by these
>>>> 'automatic list definitions' in the same way they do now.
>>>>
>>>> I guess by now we have to fill in a few details:
>>>>
>>>> <entry>
>>>>     <title>tableA</title>
>>>>     <gs:listtype>auto</gs:listtype>
>>>>     <gs:worksheet name="Invoices" />
>>>>     <gs:data>
>>>>      <gs:header height="1" nativenames="1" />
>>>>      <gs:column idx="1" key="id" name="Item" />
>>>>      <gs:column idx="2" key="descr" name="Description" />
>>>>      <gs:column idx="3" key="qty" name="Quantity" />
>>>>      <gs:column idx="4" key="ea" name="Each" />
>>>>      <gs:column idx="5" key="total" name="Total" c1="=qty * ea" />
>>>>  </gs:data>
>>>>  <content src="https://foo/bar"; type=http://schemas.google.com/
>>>> spreadsheets/2006#listfeed<http://schemas.google.com/spreadsheets/2006#listfeed>"
>>>> />
>>>> </entry>
>>>>
>>>> Now, if we want, we can add more list definitions (unique by <title/>
>>>> within the feed), defining whatever we liked.  Additional list feeds
>>>> are created for each definition we POST/PUT, using the key names for
>>>> the <gsx:*/> elements in the list entry.  (The <gs:field name=""
>>>> idx=""/> was probably a bit of overkill in the defunct Records feed).
>>>>
>>>> Any cool things that might have come with the now-defunct Record feed
>>>> in terms of queries can be added to the list feed's querying abillity.
>>>>
>>>> I slipped in the 'c1' attribute to <gs:column/> ... that's all about
>>>> declaring the formula for a given column, but only in terms of other
>>>> fields in the row.  Thought would need to be given to naming and
>>>> synatx, but think of it more as a template for a legal spreadsheet
>>>> expresion rather than a legal spreadsheet expression itself.  It might
>>>> only be valid in the 'automatic' list feed definitions. Such columns
>>>> would exhibit the property that, when you inserted a row via the API
>>>> or via the GUI, it auto-filled the formula into the cell.
>>>>
>>>> That API would certainly solve my use-cases.  What about you Jason?
>>>>
>>>> What about you, Google?
>>>>
>>>> cheers,
>>>> David.
>>>>
>>>> On Mar 17, 3:20 am, Jason Cipriani <[email protected]> wrote:
>>>> > I agree with David's sentiments. Personally, however, my use cases are
>>>> very
>>>> > different.
>>>> >
>>>> > I've been doing mostly web application development with PHP (and some
>>>> > desktop application development with Java). The Zend client library
>>>> does not
>>>> > support tables, only list and cell feeds.
>>>> >
>>>> > I map Google spreadsheet column names to application-specific column
>>>> names
>>>> > myself, all of my applications have a central configuration that
>>>> defines
>>>> > this mapping, and I have the spreadsheet access hidden behind a layer
>>>> of
>>>> > abstraction in my software. I abstract it away like this for two
>>>> reasons: 1)
>>>> > because, as David mentions, list feed keys are content-dependent and
>>>> > unreliable, and 2) so I can move away from Google spreadsheets some
>>>> day (the
>>>> > only reason I am using them is because I've been building applications
>>>> on
>>>> > top of existing systems powered by Google Forms).
>>>> >
>>>> > What I do to combat unreliable column headers is each time I access
>>>> the list
>>>> > feed, I compare the key names (and order) with the last time I
>>>> accessed the
>>>> > list feed. This is a virtually free operation compared to the list
>>>> feed
>>>> > access itself. If they change I have the application alert me via
>>>> email so I
>>>> > can go adjust the configuration. If the keys change to the point where
>>>> they
>>>> > might break the mapping in my application, the applications will
>>>> report an
>>>> > error to the user until I fix it -- for the applications I've been
>>>> writing
>>>> > it is more important to make sure the data is not corrupt rather than
>>>> making
>>>> > sure the apps have 100% uptime. Column changes happen infrequently
>>>> enough
>>>> > that this works well for me, the only reason they change is because
>>>> some
>>>> > users access the spreadsheets directly and change things they
>>>> shouldn't be
>>>> > changing (something I can't control).
>>>> >
>>>> > Because the spreadsheets I work with are generated by Google Forms,
>>>> and
>>>> > because Google Forms puts the wording of the question in the column
>>>> headers,
>>>> > and because sometimes the questions are very, very long, sometimes I
>>>> map by
>>>> > column index rather than list key because I simply don't feel
>>>> comfortable
>>>> > using 500 character key names that can change any time somebody, say,
>>>> fixes
>>>> > a grammatical error in a Google Forms question.
>>>> >
>>>> > Here is the algorithm I've been using to map column header values to
>>>> indexes
>>>> > and list feed keys. So far it has been working but it makes a lot of
>>>> > undocumented assumptions, and only works if there is at least one
>>>> non-header
>>>> > row in the spreadsheet (list feed must return a result). Weird
>>>> pseudocode
>>>> > follows:
>>>> >
>>>> > === BEGIN ===
>>>> >
>>>> > struct COLUMN {
>>>> >    string header;
>>>> >    int index;
>>>> >    string listkey;
>>>> >
>>>> > };
>>>> >
>>>> > headers = retrieve first row via cell feed, no empty cell return;
>>>> > columns = a list of COLUMN structures, initially empty;
>>>> >
>>>> > for each cell in headers {
>>>> >   COLUMN column;
>>>> >   column.header = cell.text;
>>>> >   column.index = cell.columnnumber;
>>>> >   // column.listkey not determined yet
>>>> >   columns.add(column);
>>>> >
>>>> > }
>>>> >
>>>> > sort columns by .index; // no guarantees that cells returned in order.
>>>> >
>>>> > // now 'columns' is a list of nonempty column headers and column
>>>> > // indices, from left to right in the spreadsheet.
>>>> >
>>>> > listitem = retrieve a single row via list feed;
>>>> > int curindex = 0;
>>>> >
>>>> > // assumptions:
>>>> > //   - keys for columns with empty headers will start with an
>>>> underscore.
>>>> > //   - list entry fields are in order left to right, and correspond
>>>> with
>>>> > //     column headers.
>>>> > for each field in listitem {
>>>> >   if (field.key starts with '_')
>>>> >     continue; // skip
>>>> >   columns[curindex].listkey = field.key;
>>>> >
>>>> > }
>>>> >
>>>> > // columns now contains column header, index, and list key name for
>>>> each
>>>> > // column. this data can be cached, and this cache invalidated if the
>>>> > // application detects a change in the list entry keys, which it can
>>>> do
>>>> > // as it retrieves items.
>>>> >
>>>> > === END ===
>>>> >
>>>> > *So for me, it would solve a ton of problems if the list feed simply
>>>> had a
>>>> > new column metadata element that contained column header names, column
>>>> > indices, and key names.*
>>>> >
>>>> > It's all very messy and hard to work with right now.
>>>> >
>>>> > Jason
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Mar 16, 2011 at 9:01 AM, David Bullock <[email protected]>
>>>> wrote:
>>>> > > In the lately deprecated tables API, if it had ever worked the way I
>>>> > > read that it was intended to work (which it once did), one wouldn't
>>>> > > ever need to discover the column names by reading the header row.
>>>> > > Instead, one would say something like the following when defining a
>>>> > > table:
>>>> >
>>>> > >  *in future, for my own convenience, I will refer to column B as
>>>> > > 'Description'"
>>>> >
>>>> > > The next day, when some sensible spreadsheet user inserts a column
>>>> to
>>>> > > the left of B and renames the new C1 to "Item Name", your app
>>>> doesn't
>>>> > > break.  Just like a named range, the sheet adjusts the reference so
>>>> > > that the name your code relies on - 'Description' - now refers to
>>>> > > column C ... and further, while the name your code chose and the
>>>> value
>>>> > > of the header cell C1 are now different, it's no big deal - the
>>>> column
>>>> > > has the same *semantic* meaning.  You're just isolated from changes
>>>> in
>>>> > > its position and its user-facing description.
>>>> >
>>>> > > How nice would it have been?  Very nice.  We could even have defined
>>>> > > table P to include columns A-F, and table Q to include just columns
>>>> A
>>>> > > and E (useful since the Spreadsheet API doesn't support partial
>>>> > > response/update).
>>>> >
>>>> > > Instead, Google screwed it up by introducing a two-way binding
>>>> between
>>>> > > the name the application chose to refer to the colum, and the value
>>>> of
>>>> > > the header cell.  Oomph.  Quick - back to the cells feed to see if
>>>> we
>>>> > > can work out (somehow) what to call these columns when addressing
>>>> them
>>>> > > via the list feed!
>>>> >
>>>> > > All kinds of wierd behaviour ensued (I could point you to a few bugs
>>>> I
>>>> > > personally filed), among the worst of which was the stupid, stupid
>>>> > > message "This operation is not permitted because it would alter the
>>>> > > structure of this table" when the spreadsheet user tried to do such
>>>> > > nefarious things as oh, fill-down a formula cell, or cut and paste a
>>>> > > range.
>>>> >
>>>> > > Having thus reduced the grand concept of tables to an annoying
>>>> > > repetition of the List API, some grown up in the playground
>>>> obviously
>>>> > > canned it.
>>>> >
>>>> > > And now we are stuck with this rigmarole of *looking  up* the column
>>>> > > names ... and Google are 'considering a feature that would allow you
>>>> > > to fetch the header row'.  Just let us bind our application-specific
>>>> > > names to column indexes already, and adjust them the way you do
>>>> named
>>>> > > ranges.
>>>> >
>>>> > > But it's like talking to a brick wall ...
>>>> >
>>>> > > David.
>>>> >
>>>> > > On Mar 16, 6:38 am, Vic Fryzel <[email protected]> wrote:
>>>> > > > Unfortunately, at the moment, you must use the cells feed to
>>>> accomplish
>>>> > > > this.  I'm considering a feature that would allow you to fetch the
>>>> header
>>>> > > > row with the lists feed, but I don't have an ETA yet.
>>>> >
>>>> > > > -Vic
>>>> >
>>>> > > > On Fri, Mar 4, 2011 at 9:56 PM, Jason Cipriani <
>>>> [email protected]
>>>> > > >wrote:
>>>> >
>>>> > > > >  Using PHP Zend bindings for spreadsheet API.
>>>> >
>>>> > > > > If I have a WorksheetEntry; I can use getContentsAsRows() to get
>>>> all of
>>>> > > the
>>>> > > > > entries in that worksheet's list feed.
>>>> >
>>>> > > > > The keys in each list entry are the column headers converted to
>>>> > > lowercase
>>>> > > > > with all non alpha-numeric characters stripped out.
>>>> >
>>>> > > > > How do I get the real column names for this worksheet entry, and
>>>> the
>>>> > > > > corresponding list entry keys for each column? Alternatively, a
>>>> way to
>>>> > > get
>>>> > > > > the real column name given the list entry key would work too.
>>>> >
>>>> > > > > My goal is to be able to both display the real column header
>>>> text
>>>> > > > > (spreadsheet column headers are Google Forms questions) to the
>>>> user of
>>>> > > my
>>>> > > > > application, and also to know what the list entry key is for
>>>> each
>>>> > > column so
>>>> > > > > I can display the values of the columns as well.
>>>> > > > >  One way I've been toying with is to use a cell feed to get the
>>>> first
>>>> > > row
>>>> > > > > and read the headers that way.
>>>> >
>>>> > > > > But then, how do I build a correspondence between the column
>>>> headers
>>>> > > and
>>>> > > > > the list entry values?
>>>> >
>>>> > > > > Am I guaranteed that the values in a list entry will be returned
>>>> in
>>>> > > > > left-to-right GUI order, and empty values will be present (value
>>>> count
>>>> > > will
>>>> > > > > be equal to column count)?
>>>> >
>>>> > > > > Being able to display the column headers and the corresponding
>>>> values
>>>> > > in an
>>>> > > > > application seems like a very reasonable goal, I feel like I'm
>>>> missing
>>>> > > > > something because this is surprisingly clunky...
>>>> >
>>>> > > > > Thanks!
>>>> > > > > Jason- Hide quoted text -
>>>> >
>>>> > > > - Show quoted text -- Hide quoted text -
>>>> >
>>>> > - Show quoted text -
>>>>
>>>
>>>
>>
>

Re: Retrieve header names for list feeds.

Reply via email to