I think these problems can be mitigated if the CSV format is strictly defined, 
such as how I specified it in my previous message.

In particular, the parser has to recognize only one specific header line that 
has a version number somewhere, or abort - and I still insist on quoting the 
labels with double-quote and introducing a 3rd column with specific string or 
numeric types and then replacing all the special characters in the input/output 
with ":".

Strictly defining CSV version and consequentially, the fields, and then 
specifying on what kind of data the import is supposed to fail at will limit 
the complexity of importers to N different switch cases - where N is the number 
of circulating versions of the format (for now 1).

- Ali

On Thu, Thu, 25 Aug 2022 13:48:36 +0000, rha...@protonmail.com wrote:
> > Not only is JSON limited to editing only through specific software or text 
> > editors, but (in the latter case) it is fragile enough that a single 
> > missing character can cause an entire file to fail parsing. CSV is more 
> > forgiving in this regard.
>
> I think quite simply: A forgiving format is not appropriate for a standard.
>
> It'd be hard to understate how much extra and pointless effort it creates for 
> everyone, and every implementation ends up creating its own defacto standard 
> for what it produces and accepts. Even doing something as simple as adding an 
> extra column will not be possible in the future because it'll break 
> comparability with previous parsers.
>
> I've literally worked on projects where the csv parser has evolved into 
> scan-ahead to use heuristics to understand "rules" of a csv file, and then do 
> line-by-line heuristics to override those rules in pathological cases. Makes 
> a bit of sense when you're trying to achieve 30 years of backwards 
> compatibility. Doesn't make sense for much else..
>
> If your application users really like csv, then introduce an 
> application-specific import-from-csv and export-to-csv with your own rules.
> -Ryan
>
> ------- Original Message -------
> On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craig...@gmail.com> 
> wrote:
>
> > Thanks for your thoughts Ryan.
> >
> > Without reference to the quality feedback on this proposal, I was aware 
> > when submitting it for review that it provides an excellent opportunity for 
> > bike shedding. As developers, we have all experienced frustration with data 
> > formats. One thing that I did not perhaps make clear enough is that this 
> > format is not solely intended for developers, but general users who are 
> > probably not well represented on this list.
> >
> > While doing research for this proposal I spoke to several professional 
> > users of Sparrow Wallet (who are not developers). They all expressed a 
> > desire for the format to integrate with their business processes, which are 
> > driven by business tools such as Excel. Labelling provides an important 
> > function in UTXO and address management in these scenarios, and needs to be 
> > accessible and manageable outside of wallet software.
> >
> > If this is to be achieved, it immediately rules out JSON as a data format. 
> > Not only is JSON limited to editing only through specific software or text 
> > editors, but (in the latter case) it is fragile enough that a single 
> > missing character can cause an entire file to fail parsing. CSV is more 
> > forgiving in this regard. With respect to your comments on escaping, my 
> > expectation would be that developers will be using a mature CSV library 
> > rather than handling character escaping themselves. I would rather propose 
> > a format that is generally usable, even if occasionally a label is escaped 
> > incorrectly.
> >
> > Finally, I'll note that CSV files are already common and uncontroversial in 
> > Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many 
> > others) already export addresses and/or transactions with their labels as 
> > CSV files. This proposal simply attempts to create a standard for importing 
> > and exporting all the labels in a wallet.
> >
> > Craig
> >
> > On Wed, Aug 24, 2022 at 9:01 PM <rha...@protonmail.com> wrote:
> >
> >> I'd strongly suggest not using CSV. Especially for a standard. I've worked 
> >> with it as an interchange format many a times, and it's always been a 
> >> clusterfuck.
> >>
> >> Right off the bat, you have stuff like "The fields may be quoted, but this 
> >> is unnecessary as the first comma in the line will always be the 
> >> delimiter" which invariably leads to some implementations doing it, some 
> >> implementations not doing it, and others that are intolerant of the other 
> >> way.
> >>
> >> And you have also made the classic mistake of not strictly defining escape 
> >> rules. So everyone will pick their own (e.g. some will \, escape commas, 
> >> others will not cause it's quoted and escape quotes, and others will 
> >> assume no escaping is required since its the last column in a csv).
> >>
> >> Over time it morphs into its own mini-monster that introduces so much pain.
> >>
> >> On a similar note, allowing alternatives (like: txid>index vs txid:index) 
> >> provides no benefit, but creates additional work for implementations (who 
> >> quite likely only test formats they produce) and future incompatibilities.
> >>
> >> I know everyone loves to hate on it, but really (line-separated?) json is 
> >> the way to go.
> >>
> >> { "tx": 
> >> "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?", 
> >> "label": "wow, such label" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", 
> >> "txout": 4, "label": "omg this is so easy to parse" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", 
> >> "txin": 0, "label": "wow this is going to be extensible as well" }
> >>
> >> -Ryan
> >>
> >> ------- Original Message -------
> >> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev 
> >> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I would like to propose a BIP that specifies a format for the export and 
> >>> import of labels from a wallet. While transferring access to funds across 
> >>> wallet applications has been made simple through standards such as BIP39, 
> >>> wallet labels remain siloed and difficult to extract despite their value, 
> >>> particularly in a privacy context.
> >>>
> >>> The proposed format is a simple two column CSV file, with the reference 
> >>> to a transaction, address, input or output in the first column, and the 
> >>> label in the second column. CSV was chosen for its wide accessibility, 
> >>> especially to users without specific technical expertise. Similarly, the 
> >>> CSV file may be compressed using the ZIP format, and optionally encrypted 
> >>> using AES.
> >>>
> >>> The full text of the BIP can be found at 
> >>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki 
> >>> and also copied below.
> >>>
> >>> Feedback is appreciated.
> >>>
> >>> Thanks,
> >>> Craig Raw
> >>>
> >>> ---
> >>>
> >>> <pre>
> >>> BIP: wallet-labels
> >>> Layer: Applications
> >>> Title: Wallet Labels Export Format
> >>> Author: Craig Raw <cr...@sparrowwallet.com>
> >>> Comments-Summary: No comments yet.
> >>> Comments-URI: 
> >>> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >>> Status: Draft
> >>> Type: Informational
> >>> Created: 2022-08-23
> >>> License: BSD-2-Clause
> >>> </pre>
> >>>
> >>> ==Abstract==
> >>>
> >>> This document specifies a format for the export of labels that may be 
> >>> attached to the transactions, addresses, input and outputs in a wallet.
> >>>
> >>> ==Copyright==
> >>>
> >>> This BIP is licensed under the BSD 2-clause license.
> >>>
> >>> ==Motivation==
> >>>
> >>> The export and import of funds across different Bitcoin wallet 
> >>> applications is well defined through standards such as BIP39, BIP32, 
> >>> BIP44 etc.
> >>> These standards are well supported and allow users to move easily between 
> >>> different wallets.
> >>> There is, however, no defined standard to transfer any labels the user 
> >>> may have applied to the transactions, addresses, inputs or outputs in 
> >>> their wallet.
> >>> The UTXO model that Bitcoin uses makes these labels particularly valuable 
> >>> as they may indicate the source of funds, whether received externally or 
> >>> as a result of change from a prior transaction.
> >>> In both cases, care must be taken when spending to avoid undesirable 
> >>> leaks of private information.
> >>> Labels provide valuable guidance in this regard, and have even become 
> >>> mandatory when spending in several Bitcoin wallets.
> >>> Allowing users to export their labels in a standardized way ensures that 
> >>> they do not experience lock-in to a particular wallet application.
> >>> In addition, by using common formats, this BIP seeks to make manual or 
> >>> bulk management of labels accessible to users without specific technical 
> >>> expertise.
> >>>
> >>> ==Specification==
> >>>
> >>> In order to make the import and export of labels as widely accessible as 
> >>> possible, this BIP uses the comma separated values (CSV) format, which is 
> >>> widely supported by consumer, business, and scientific applications.
> >>> Although the technical specification of CSV in RFC4180 is not always 
> >>> followed, the application of the format in this BIP is simple enough that 
> >>> compatibility should not present a problem.
> >>> Moreover, the simplicity and forgiving nature of CSV (over for example 
> >>> JSON) lends itself well to bulk label editing using spreadsheet and text 
> >>> editing tools.
> >>>
> >>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, 
> >>> containing one record per line, with records containing two fields 
> >>> delimited by a comma.
> >>> The fields may be quoted, but this is unnecessary, as the first comma in 
> >>> the line will always be the delimiter.
> >>> The first line in the file is a header, and should be ignored on import.
> >>> Thereafter, each line represents a record that refers to a label applied 
> >>> in the wallet.
> >>> The order in which these records appear is not defined.
> >>>
> >>> The first field in the record contains a reference to the transaction, 
> >>> address, input or output in the wallet.
> >>> This is specified as one of the following:
> >>> * Transaction ID (<tt>txid</tt>)
> >>> * Address
> >>> * Input (rendered as <tt>txid<index</tt>)
> >>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >>>
> >>> The second field contains the label applied to the reference.
> >>> Exporting applications may omit records with no labels or labels of zero 
> >>> length.
> >>> Files exported should use the <tt>.csv</tt> file extension.
> >>>
> >>> In order to reduce file size while retaining wide accessibility, the CSV 
> >>> file may be compressed using the ZIP file format, using the <tt>.zip</tt> 
> >>> file extension.
> >>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 
> >>> or AES-256 encryption, which is supported by numerous applications 
> >>> including Winzip and 7-zip.
> >>> In order to ensure that weak encryption does not proliferate, importers 
> >>> following this standard must refuse to import <tt>.zip</tt> files 
> >>> encrypted with the weaker Zip 2.0 standard.
> >>> The textual representation of the wallet's extended public key (as 
> >>> defined by BIP32, with an <tt>xpub</tt> header) should be used as the 
> >>> password.
> >>>
> >>> ==Importing==
> >>>
> >>> When importing, a naive algorithm may simply match against any reference, 
> >>> but it is possible to disambiguate between transactions, addresses, 
> >>> inputs and outputs.
> >>> For example in the following pseudocode:
> >>> <pre>
> >>> if reference length < 64
> >>> Set address label
> >>> else if reference length == 64
> >>> Set transaction label
> >>> else if reference contains '<'
> >>> Set input label
> >>> else
> >>> Set output label
> >>> </pre>
> >>>
> >>> Importing applications may truncate labels if necessary.
> >>>
> >>> ==Test Vectors==
> >>>
> >>> The following fragment represents a wallet label export:
> >>> <pre>
> >>> Reference,Label
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> >>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
> >>>  (alternative)
> >>> </pre>
> >>>
> >>> ==Reference Implementation==
> >>>
> >>> TBD

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Reply via email to