Re: [elm-discuss] Immutable data design problem

Aaron VonderHaar Mon, 24 Jul 2017 23:02:46 -0700

Ah yes, a nice complicated system :)   If you're still looking for more
abstract suggestions, here's how I would approach the problem:


It sounds like you're trying to construct intermediate data structures that
directly map to certain domain concepts, and then you later have to
transform that data into a structure that you ultimately need.  I suspect
you may be able to clean it up a bit by focusing on what you need, and not
on what you think you ought to have.

At the boundaries of the system, you have structured input data (perhaps
being decoded from JSON?), possibly structured output data (perhaps being
sent out as JSON?), and the UI view.  In my opinion, those are the most
important types in your system.  Rather than trying to devise a data
structure that's somewhere in the middle of the input and output, I'd focus
on modeling the input and the output data structures in isolation, and then
try to figure out the shortest/most modular route for transforming the data
from one to the other.  If you work in this way, I think you'll tend to end
up with more modular functions (which will also benefit you later, as you
mentioned anticipating having to handle realtime updates to the data).

In what you're describing, the following things stuck out to me:

> AssessmentEvents must in turn have been created by calling
createAssessmentEvent, which takes the independent fields of an
AssessmentEvent and creates the full record with the derived fields

This sounds like it might be premature optimization.  If you didn't already
try this, I'd suggest passing around the original raw fields instead and
exposing the functions that can compute the derived fields.  Furthermore,
try having those functions take only the things that are needed for that
exact calculation, rather than taking the entire AssessmentEvent record as
an input parameter.  Doing this will help expose the actual dependencies in
your data and avoid unnecessary coupling with the "AssessmentEvent" concept.

> when outside code needed to get the delta value, it couldn't just have an
AssessmentEvent. It would have to have an AssessmentStore (or Parcel) and
an EventID

I would work backwards here and start with what data structure makes sense
in the view, and then write the code to generate that from the raw data,
and then see if there are logical groupings that make sense to refactor out
as data types/modules (as opposed to starting with the domain concepts you
think you are supposed to have and trying to write your view to work with
those).

Does your view just want a list of things to iterate through and display?
If so, it sounds like you want to give it a list of records that have all
the necessary data assembled so you can just iterate through it.  Or do you
have some kind of master-detail view where one view is showing the details
of a thing that is chosen in another view?  In that case you might want to
have the selector view produce an Id that's stored in the model and used to
later request a specific item to be calculated for the detail view.  Or
maybe different parts of the view show different pieces of information,
each of which is hard to compute?

In either case, I'd try writing your view the way you want it, then write
the function to transform the data how you need it, and then decide whether
that function (or parts of it) make the most sense in the view module or in
one of the data structure modules.


Overall, my suspicion is that you might be trying to specific domain
concepts that you are expecting to have but are possibly unnecessary for
what you need to do.  So it might be useful to try to cleanly model the
input data and the data you want to display, then write the functions to
map between then, and only then figure out how those functions map to your
business domain as you refactor.  Based on what you described, it sounds
like Parcel, AssessmentEvent, and Property are all getting quite
interconnected.  If you instead try to focus on transforming from input to
output as directly as possible, I think you'll end up with a system that's
easier to modify or reconfigure later.  (The downside is that it may seem
unnatural to people who are used to thinking in terms of the standard
business domain concepts.)

I'll note that the suggestions here match the way I personally like to
approach problems, which is to focus on iteratively discovering the
interfaces.  If that style doesn't match the way your team works, you
should disregard :)

Also, would be happy to look at some type annotations if you want to talk
more concretely.

On Mon, Jul 24, 2017 at 4:02 PM, Lyle Kopnicky <lylew...@gmail.com> wrote:

> Hi Aaron,
>
> Thanks for your thoughtful reply.
>
> The domain model is pretty complex, so it's hard to distill down to a few
> issues. There's a higher-level structure called a Parcel. That already
> contains, among other things, the list of AssessmentEvents. I have a
> function called createParcel that takes a record with a parcel number,
> initial owner, and list of AssessmentEvents. Those AssessmentEvents must in
> turn have been created by calling createAssessmentEvent, which takes the
> independent fields of an AssessmentEvent and creates the full record with
> the derived fields. However, there really are yet more fields that can't be
> derived by looking at a single AssessmentEvent in isolation. Some
> calculation has to be done by determining chains of them and computing
> deltas along the chain.
>
> Currently I have createParcel computing a Dict of assessmentEventsById
> (so, it's assuming some ID already exists on the AssessmentEvents, which is
> a separate issue). It also computes a list of roll years that are relevant
> to the assessment events, which involves some date math. It computes an
> ownership chain - that is, a list of date ranges and who owned the property
> during that time range. And finally it computes the list of assessment
> events that are effective for each roll year. Each assessment event might
> appear in the list for as many as two consecutive years, depending on its
> dates.
>
> Then there will have to be deltas calculated between the assessment events
> for a given roll year. The accounts will be created from those. And
> finally, one or two bills will be created from each account, depending on
> the type of assessment event. All of this will be completely deterministic,
> based on the initial seed data of assessment events. But I need these
> accounts and bills calculated in order to properly view the data.
>
> If I am using IDs, then I can make a data structure that just contains the
> deltas by ID, rather than creating another AssessmentEvent structure that
> has room for the delta values. But that would mean that when outside code
> needed to get the delta value, it couldn't just have an AssessmentEvent. It
> would have to have an AssessmentStore (or Parcel) and an EventID and call a
> function which could use that to retrieve the delta value from the Dict.
> So, it's a pretty different model for the caller.
>
> So far I have been putting all this logic in one module, called Property.
> (The view logic is in a separate module.) I've been using datatypes with a
> single constructor, so the view code can pattern match against them. But
> now I'm starting to wonder whether it'd be safer to hide the
> representations here in the Property module.
>
> At some point in the future I will want to allow adding/removing/updating
> assessment events in real time. Then I will have to decide whether I want
> to just recalculate the entire set of data or try to figure out which bits
> need to change. Recalculating the whole thing will probably be performant
> enough. But I guess there could be an issue with IDs - if some data gets
> loaded from the database and needs to preserve existing IDs, I can't just
> generate new IDs for the whole set. I'll figure out that problem when I
> come to it.
>
> Regards,
> Lyle
>
> On Sunday, July 23, 2017 at 8:17:07 PM UTC-7, Aaron VonderHaar wrote:
>>
>> I'm not sure I understand all the details of your domain model, but it
>> seems like the notable point is that accounts are created implicitly as
>> assessment events occur, and you'd like to be able to, given an assessment
>> event, get the related accounts?
>>
>> I'd probably start with making a module (maybe called "AssessmentStore")
>> that has functions that describe what you need.  I'm thinking something
>> like:
>>
>> allEvents : AssessmentStore -> List AssessmentEvent
>>
>> and hmm... now that I write that out, it seems like that's all you want,
>> except that you ideally want AssessmentEvent to have a list of Accounts in
>> it.
>>
>> I think the approach I would prefer is similar to what you mention in
>> your last paragraph about keeping the data in separate structures, but you
>> question the safety of managing parallel structures.  If you create a
>> separate module to encapsulates the data, you can can limit the need for
>> careful handling to that single module.  I might try something like this in
>> `AssessmentStore`:
>>
>> type AssessmentStore =
>>     AssessmentStore
>>         { assessmentEventInfo : Dict EventId { name : String, ... } --
>> This is not the full AssessmentEvent; just the things that don't relate to
>> accounts.
>>         , accountsByEvent : Dict EventId (List AccountId)
>>         , accountInfo : Dict AccountId Account
>>         , allEvents : List EventId -- (or maybe you want them indexed
>> differently, by time, etc)
>>         }
>>
>> then have a function to create the assessment store, and then the
>> `allEvents` functions suggested above (or any other function to get
>> AssessmentEvents) can take the data in that private data structure and
>> merge it together to give the data that you actually want to return to the
>> caller.  In fact, you never need to expose the AccountIds/EventIds outside
>> of this module.
>>
>> If you are still worried about safety, you can add more unit tests to
>> this module, or try to define fuzz test properties to help you ensure that
>> you handle the computations correctly within the module.
>>
>> I've found this sort of approach to work well because it lets you
>> represent the data in whatever data structure is most performant and/or
>> appropriate for your needs (it is often also simpler to implement because
>> the data structures tend to be much flatter), but also hides the internal
>> representation behind an module interface so that you can still access the
>> data in whatever ways are most convenient for the calling code.
>>
>>
>>
>>
>> On Sun, Jul 23, 2017 at 7:16 PM, Lyle Kopnicky <lyle...@gmail.com> wrote:
>>
>>> I have a series of datatypes that have already been modeled in a
>>> relational database in a product. I'm trying to construct a lighter-weight
>>> in-memory representation in Elm for purposes of simulating operations and
>>> visualizing the results. Ultimately I will probably want to do some
>>> export/import operations that will allow someone to view data from the real
>>> database, or create records in a test database. But, I don't think the
>>> in-memory representations need to correspond exactly to the database ones
>>> in order to do this. I'd like to focus on as simple of a representation as
>>> possible, and I'm leaving out a fair number of fields.
>>>
>>> We start with a provided series of AssessmentEvents. It's just a limited
>>> amount of data for each AssessmentEvent. Some of the fields in the database
>>> can be calculated from the others, so those don't need to be provided. From
>>> this data, we can calculate more information about the AssessmentEvents,
>>> including deltas between them. We can also derive a series of Accounts in a
>>> completely deterministic fashion. Each AssessmentEvent will have up to two
>>> years associated with it, and for each year there will be at least one
>>> Account. From this we can also calculate one or two Bills to go with each
>>> Account.
>>>
>>> It's a fairly complex calculation. Certainly I can do it in Elm. But
>>> what I'm waffling about is how to store the data. These calculations can be
>>> cached - they do not need to be repeated if the user just changes their
>>> view of the data. They only need to be revised if the user wants to
>>> insert/edit/update AssessmentEvents. So to do all these calculations every
>>> time the user shifts the view would be wasteful.
>>>
>>> It becomes tricky with immutable data. In an object-oriented program, I
>>> would probably just have, say, extra empty fields on the AssessmentEvent
>>> object, that I would fill in as I updated the object. E.g., it could have a
>>> list of accounts, which initially would be a null value until I filled it
>>> in.
>>>
>>> At first I thought I might do something similar in the Elm data
>>> structure. An AssessmentEvent can contain a List of Accounts (I'm
>>> oversimplifying as it really needs to list the accounts per year). The list
>>> of Accounts can be initially empty. Then as I calculate the accounts, I can
>>> create a new list of AssessmentEvents that have Accounts in the list. But
>>> wait - since the list of AssessmentEvents is immutable, I can't change it.
>>> I can only create a new one, and then, where in the model do I put it?
>>>
>>> When a user initializes the model, then, what should they pass in?
>>> Perhaps they can pass in a list of AssessmentEvents that each have an empty
>>> list of Accounts, and then that gets stored in a variable. Then the
>>> Accounts are calculated, and we generate a new list of AssessmentEvents
>>> with Accounts attached, and that is what gets stored in the model.
>>>
>>> But this has some shortcomings. The user must now create something that
>>> has this extra unused field on it (and there will be more). I guess if they
>>> are using a function to create it, they needn't know that there are these
>>> extra fields. But what if the field isn't a list - it's an Int? Then do we
>>> need to make it a Maybe Int? Then all the code that later operates on that
>>> Int will have to handle the case that the Maybe Int might be a Nothing,
>>> even though at that point I know it will always be Just something.
>>>
>>> Maybe there should be a data structure that contains an AssessmentEvent,
>>> also containing the extra fields? But what if I have a series of functions,
>>> each of which adds some new field to the AssessmentEvent? Then I need a new
>>> data type for each step that just adds one more field?
>>>
>>> Perhaps if I use untagged records, then all the functions can just
>>> operate on the fields they care about, ignoring extra fields. I sort of
>>> liked the extra type safety that came with the tagged record, but it may
>>> just get in the way.
>>>
>>> Perhaps instead of attaching this extra data to AssessmentEvents, it
>>> could be kept in separate data structures? But then how do I know how they
>>> are connected? Unless I carefully manage the data in parallel arrays, I
>>> will need to add IDs to the AssessmentEvents, so they can be stored in a
>>> Dict.
>>>
>>> These are just some of my thoughts. Does anyone have any suggested
>>> patterns to follow?
>>>
>>> Thanks,
>>> Lyle
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Elm Discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elm-discuss...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Elm Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elm-discuss+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "Elm 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elm-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [elm-discuss] Immutable data design problem

Reply via email to