Re: [Ledger-smb-devel] [DESIGN] Proposed structure fol LedgerSMB web services

John Locke Tue, 28 Jul 2015 01:28:30 -0700

Hi,

On 07/27/2015 11:54 AM, Erik Huelsmann wrote:

On Mon, Jul 27, 2015 at 7:03 PM, John Locke <m...@freelock.com<mailto:m...@freelock.com>> wrote:
    Hi, Erik,

    Nice start. Some quick comments (not a lot of time...):

    On 07/27/2015 05:51 AM, Erik Huelsmann wrote:
    VERSION DETECTION
    =================

    The user of the api should run an OPTIONS request on the base URL
    (/api) to discover version, options and features of the API.
    This is starting to sound like a WSDL? I think that could be a
    huge benefit to do, as long as it's not required...
What I was talking about was this discussed in this blog post:http://zacstewart.com/2012/04/14/http-options-method.html

Very cool! Following some links from there led me here:http://www.restdoc.org/

As to what you mean by "required", I don't know. If you mean "requiredto read before using the API", then, no, it's not required. If youmean "required when implementing a new service", then I think theanswer should be "yes, it's required". Every service requiresdocumentation and if this is the only documentation, then I'm quitehappy to accept the service.


Agreed, sounds good.

And if you're creating documentation anyway...


    URL STRUCTURE
    ==============

    All URLs in the document are assumed to be relative to some base
    URL. E.g. assuming LedgerSMB to be hosted under the following
    URL: https://example.com/path/to/ledgersmb/ , the URL /api in
    this document in fact means
    https://example.com/path/to/ledgersmb/api .

    The proposed URL structure is (as can be found in many existing
    web service schemas):

     (a) /api/<version>/<resource>/[?query parameters]
     (b) /api/<version>/<resource>/id
     (c) /api/<version>/<resource>/id?perform=<action>

    The above is mostly inspired on the PayPal API which - I think -
    drives a system much like ours in the sense that their system
    manages workflow producing transactions.

    In our case, I think the "id" specifier in the resource may be
    multiple path segments long; e.g. for currency rates:
    /api/v1/exchangerate/EUR/1/2015-12-12 where "EUR/1/2015-12-12" is
    the identifier for the: currency identifier, rate type and
    (start)date of the rate.

    Form (a) will be used for creating (POST) and listing (GET)
    resources instances. Dojo proposes to use the 'Range:' HTTP
    header to limit results in the request. I think that makes more
    sense than to use query parameters for it.


    For development/debugging, I really like having an API observe
    query parameters in addition to Range headers. I would suggest we
    support both, and pick one to win...

Hmm. I'm not really in favor of having duplicate functionality. Peopleseem to be using cURL to test their services; it should be pretty easyto add the Range header (or others, for that matter) to a cURLrequest. What method do you use?

Ha. For a while I was using a home-grown Dojo single-page app fortesting out APIs, have played around with quite a bit, but it's been awhile since I've done a major API project. I've seen some decent browserextensions for some of these kinds of things...

The other thing I'm thinking of here is for more light-weight, reportingtypes of uses. I'm not sure how much control you can get over headerswhen doing a cross-domain request from a browser -- I'm thinking alightweight JS app that might want to grab the last 10 sales invoicesfor a dashboard, or something like that -- with an iframe, for example,you can't necessarily set browser headers but you can easily add a GETparameter.

Not a big deal these days, there's so many decent tools for doing itright with a toolkit that we may not need the "lightweight" GET-onlyapproach, but I do think there may be scenarios where it might proveuseful...

    Form (b) will be used for retrieving (GET) an individual resource
    instances.
    Should we add support for PUT here?
Well, the entire API doesn't list PUT, because PUT was mapped to "U"of CRUD. And the design explicitly delegates updates of resources toRPC calls. The reasoning behind that is: old code looks the way itdoes because it *tries* to derive based on a (partial) "before" stateand an "after" state what the user might have done in the webapplication. Most of the time it guesses correctly, but there are lotsof cases where it simply can't tell the user's intent from thecombined states. I don't want to propagate that mess to another levelof teh application when I'm able to eliminate it somewhere (bybuilding a different UI).
    Form (c) will be used for (POST) modifying state of individual
    resource instances by executing <action> on the specified resource.


    MEANING OF REQUEST TYPES
    =========================

    (Note that the API doesn't attach meaning to the HTTP request
    types PUT and PATCH (which the PayPal API *does* do)) -- I could
    see value in supporting a PATCH request for resources which
    require secondary approval and have not yet been approved (this
    is where PayPal uses it too).

    GET
    ------
     Retrieves an object or collection of objects, potentially
    restricted by query parameters or HTTP headers.

    POST
    --------
     Creates an object or collectino of objects when executed on a
    resource URL; when executed on a resource-instance URL, a
    required ?perform=<action> query string is to be added to the URL
    to specify which state transition is to be executed.

    Each POST request in the API carries a payload where the
    consuming service should support at least one of the following
    formats (as indicated by the OPTIONS response)

     (i) application/json
     (ii) application/xml
     (iii) application/form-data
     (iv) application/x-www-form-urlencoded
    The API itself should be responsible for doing this conversion --
    and should allow the consuming client to send whichever of these
    it wants. The API can then convert to a Perl data object of some
    kind to pass off to the internal code.
Ok. You're saying there's *always* going to exist a mapping fromapplication/form-data to application/json? I mean, I can imagine thata mapping like that for non-nested structures, but what about nestedobjects in arrays? I mean, in the new multi currency branch we haveform fields named debit_1 and debit_fx_1; how do those map to aJSON/Javascript object?
Don't get me wrong; I'd like to delegate this to the request consumer too.

I think we just define a convention, and describe it. Perhaps make itsimply mirror the Json structure with _ separated parts? e.g.debit_1_value=234&debit_1_fx=222&debit_2_value=444&debit_2_fx=400 mapsto json as: (intentionally swapping the index to the 2nd position)

[{
  "debit":{
    "value":234,
    "fx":222
  }
},
{
  "debit":{
    "value":444,
    "fx":400
  }
}]

... I mean we already do this for form posts now, we need to convert itto some sort of data object internally anyway, why not build a librarythat does this for us, regardless of what format it receives in therequest? Might need to change some of the current form field names...

This and the previous note does bring up something missing here:response format. Like the Range header, there's the "Accept" header theclient can send, and I've also found it useful for very quick browserdebugging to allow overriding that with a GET parameter.


So we should discuss the formats we support for the response:

application/json
application/xml
text/yaml
text/csv
text/html
application/x-latex
application/pdf

... and of course how we handle these. Json, XML, CSV are prettystraightforward (hey, are there any industry-specific XML formats weshould leverage/offer?) -- for nested data in CSV I've typically seenJson used...

For those last 3, clearly there's a need for templates for each kind ofobject...

If we've done a good job on the API, we should be able to plug inrequest formatters and response formatters easily -- so we could addtext/yaml by writing a new plugin for both response and request handling...

Ah, yes, and that's exactly why I think we need to support a GETparameter in addition to Range: header -- then you can simply generate aURL to get a CSV or HTML report of the most recent 10 payments fromclient X.

    Is PUT to be added to this list? I would expect PUT to update
    values of an existing object, and needs to contain all new values
    for the object. Obviously since we're doing financial
    transactions, this probably can only modify drafts and not
    anything posted (in a financial sense). But for drafts,
    reconciliation, batches, etc. this seems useful.


    POST or PATCH can be used for modifying just a field on an object,
    or handling things like payments on an object?

Ah. Good point. POST(with an rpc endpoint) would be for adding apayment to an open item. PATCH would be to change the values of anexisting object which is still editable.


Ok. That all sounds fine to me...

    ATOMICITY
    =========

    When an api call affects multiple resources and the API call
    returns an error *none* of the affected resources are to be affected.
    Do we want the API to support a transaction, allow a bunch of
    operations to get batched with atomicity? e.g. failure after a
    series of web service calls rolls back the whole batch, if there
    are no errors entire batch gets committed?

    If we can support that, that seems like another big win...
Hmm. My initial reaction is that we can support most of this byattaching transactions to an (unapproved) batch and then offering theuser the option to either approve or remove the entire batch. Wouldthat work for the use-case you envision?

Yes, that's exactly what I'm envisioning. Although a different "batch"mechanism than our current batches -- something specific to the API thatcan do all sorts of data changes and then approve in one go...

Actually, thinking about it, I can see how to put it all into onetransaction. However, if that works, it depends on what you expect onsubsequent calls within the same batch. Do you expect any queries toreturn the new values while they have not been fully committed to thedatabase? Or do you just expect to send loads of modifications? Do youexpect to be returned new IDs?

Good questions, and this gets beyond my experience -- I haven't actuallydone that much transactional programming to know the best practices here...

I would think we would expect subsequent calls to have the new values,and I do know that Dojo stores have supported "placeholder" ids that canbe replaced with permanent ones after the data is committed, so I wouldtend to think that pattern should work, a "placeholder" that is returnedwhile the batch is "open", and when the batch is "approved" a set ofreplacements get returned so the client can update with final IDs.


Should we be considering UUIDs here?

My basic idea was to batch up all RPC calls and delay them until thefinal "COMMIT" comes in and executing all the batched commands insidea single database transaction.

Hmm. Goes against REST, but then we are talking about financial systems,practically the definition of transactional logic. It feels like we arereinventing SOAP!

I'm thinking about the scenarios here, and the one that comes to mind is"shipping" some products on a sales order. We use this all the time--skipping the shipping screen, we just put in a value in the "ship" boxand "Create Invoice." The current LSMB adjusts the sales order lineitems/totals, and commits that, and then takes you to a create invoiceform that is completely open, unsaved, and in my opinion really shouldbe in a transaction -- the sales order qtys shouldn't get updated untilthe invoice is posted (or at least saved as a draft).

That's a scenario I think the current app should do in a transaction,and doesn't.

I am also thinking about how you do transactions in a database, that yougenerally have to start a transaction with a "BEGIN" and otherwise it'snot in a transaction. I'm thinking we just model the API the same way,that it's not in a transaction unless it's explicitly called for.

I also think this entire transaction functionality can be deferred untila later version, as long as you're thinking about it with the currentversion so it's something that can be added later...

    SESSION MANAGEMENT
    ====================

    The API user logs in by creating a new session through the
    /api/<version>/session/ API. Each application login (including
    API logins) is attached to an application user. the webservice
    caller thereby identifies itself as an application user/employee.
    Currently, credentials will be provided through basic auth on the
    first *and* all following requests. Session replay attacks are
    prevented by sending cookies back and forth; just as they are
    now. Each request should provide the cookies created during the
    session; possibly updated by the response of the last request --
    basic cookie management.
    At the end of a session, the session is to be removed by issueing
    a DELETE request on the session resource instance.

    Regardless of whether the response generated by the server is a
    failure or a success, the session cookies should be updated on
    each request. The client must respect cookie updates regardless
    of the type of response.



    Hmm. What if the same client is running multiple, parallel
    transactions? How would we handle race conditions here? Is it
    possible for the same session to have multiple sequences?

Good point, but it seems to work for PHP, RoR, ... I'll look aroundand try to find how others solve it. Maybe by opening a second session?

I think the general approach is a token sent in the body, not a cookie.The browser will send all cookies in any session... You can probably goto some extra steps to isolate sessions with curl, but I mostly just usethe "cookiejar" in curl that makes it act like a browser here...

IIRC, the Drupal Form API sets a "form-build-id" and a "form-token". Thebuild-id essentially is the session/form sent to the browser, and thetoken is used to validate and detect replays.


Drupal then caches the entire form using the build-id as the cache key.


    Alternatively, API calls can be invoked from sessions originally
    authenticated against /login.pl?action=authenticate
    <http://login.pl?action=authenticate> (with the same further
    requirements as above).


    ENCODING OF VALUES
    ==================

    Each of the supported formats need to have their own design
    documents which specify how to encode specific values. While this
    has been mostly handled for JSON, there's a missing data point
    with respect to encoding dates. Dojo handles encoding dates from
    the client to the server, but I've been unable to find if/how
    Dojo's JSON can deserialize dates coming from the server.

    Yes it can... the dojo/date functionality works both ways -- I
    would suggest we deserialize to a Javascript object in the store
    functions themselves, this works pretty well.

Well, agreed that at least it *used* to do it: in the dojo/data docsthere's mention of *serializing* (but I couldn't find any mention ofdeserializing). In the new dojo/(d)store, there's nothing in thedocumentation that I could find. But, indeed, the only correct placeto deserialize dates into date objects does seem to be in the stores.


http://dojotoolkit.org/reference-guide/1.10/dojo/date/locale/parse.html

    NESTING OF RESOURCES
    =====================

    When obtaining a resource from the server, the serving webservice
    may include embedded in its response objects that it refers to;
    e.g. the server may decide to include address data included in a
    response to a query for a customer. The server isn't required to
    include more than just the key by which the resource can be
    queried out of the resource collection.

    Nested resources in the URL space (such as the GitLab example
    with team members in a project [2]).
    *** Nested resources like the GitLab example pollute the
    namespace, because there's a two way correspondence:
    users-in-project and projects-in-user. *** How to handle this in
    the way that creates the least complexity??? *** Presumably, we
    want things to be layered, building complex resources on simple
    ones; so it's problematic in the gitlab example to make the user
    aware of the projects... ***
    We should support and default to "obvious" nested resources. e.g.
    line items on an invoice, payment lines, etc.
Do you mean that these nested resources should be made available atthe URL level? Or simply *always* be embedded in the response object?Basically, I wasn't thinking of the journal lines as individualresources. I think the *journal* is the individual resource, with anumber of lines "inside" it. Would it make more sense to you to makethe individual lines into resources too? [I can see reasoning for thattoo, because it allows running queries on the journal-line resourceand filter out all lines on e.g. a single account...]

Well... yes. I think this boils down to a question of "documentdatabase" or "relational database". Obviously, we're built on arelational database, and I've never truly warmed up to puredocument/object storage, the "NoSQL" movement... At the same time, thestructure of an invoice in LSMB is pretty well-defined, and doesn't varymuch, so we can present the entire thing as a "document", even thoughthe lines are themselves first-class objects.

Maybe this is just force of habit for me, and there may well not be anyactual need for it, but I would think that pretty much anything that canbe a line on a report or an invoice should be directly addressable. Butmaybe that's overkill?

I've built one very complex system from scratch, and with that one Ijust made each level of the hierarchy extend the base data object class,and so I did essentially get the basic CRUD APIs for this for free, onceI mapped my API layer to the data object -- about the only thing thatneeded attention at each level were the fields available for indexqueries -- and then the nesting issues we're discussing. I guess Ididn't think that much about whether we *needed* that level of access(though it certainly helped when debugging).


    I do think we should plan to allow the client to request what data
    to nest, perhaps either a custom header or a parameter (or both)?
    This would be one area that needs to be self-documenting, what
    resources can be excluded/included/expanded in which requests, and
    what is included by default.

I like that. I'll think about how we can model this.

As I think about it, I really only see two levels here: expanded, orcondensed. Expanded, for an invoice, the response would include thecustomer record, each line item detail, each payment line. Condensed, itwould only contain references to these other records, which would haveto be retrieved separately if they don't exist.


How much deeper is useful to go?

Would we ever want to load the product from the line item? Perhaps, andthen need to look up a pricegroup for a customer for the product... notexactly sure how this is currently modeled. But that really seems ascomplicated as this system gets. Oh, I guess there's entity/eca/contactmethod.

Thanks for your response! With a few times going back-and-forth, wecan probably have something we can start working with and build ourexperience in the context of *this* application.


Sounds good!

Cheers,
John

------------------------------------------------------------------------------

_______________________________________________
Ledger-smb-devel mailing list
Ledger-smb-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel

Re: [Ledger-smb-devel] [DESIGN] Proposed structure fol LedgerSMB web services

Reply via email to