Ah that makes sense, thank you! Any recommendation on which algorithm works
well?

On Tue, Dec 21, 2021 at 5:29 AM Daniele Nicolodi <[email protected]> wrote:

> On 21/12/2021 00:55, Aaron Stacy wrote:
> > Hi, I'm looking for suggestions for categorizing spending (not so much
> > things like paycheck, brokerage transactions, etc, but stuff like credit
> > card spending for budgeting). My ledger has around 2800 transactions
> > over about 2 years, so it's not a ton of data, but it seems like enough
> > that I could leverage something smarter than just string matching
> > the transaction narrations.
> >
> > Does anyone have recommendations for categorizing spending?
> >
> > I'm thinking of applying a full text search index as follows:
> >
> > - Each expense account is a "document".
> > - The document contents is the narration of every transaction for that
> > account.
> > - To categorize a new transaction, use an engine like Lucene
> > <https://lucene.apache.org> to or sklearn.TfidfVectorizer
> > <http://sklearn.TfidfVectorizer> and pick the most likely account.
> >
> > Any thoughts on this approach? (aside from being over-engineered. I'm an
> > engineer, IDK what to tell you it's what I do)
>
> I use Beancount and to assign accounts to transactions I use a machine
> learning classifier trained on my existing ledger implemented using
> sklearn.
>
> This works reasonably well for recurring transactions but is not
> infallible. I found that putting a threshold on the confidence score
> from the classifier is essential for not ending up with completely bogus
> account assignments.
>
> Cheers,
> Dan
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ledger-cli/CACjABkk3stsisCMOcWfmjoq438Lq65PvkQ%2B201mB8a_ZUXVTiw%40mail.gmail.com.

Reply via email to