Hi Lori,

Thank you for the speedy and detailed reply. I will take a crack at the 
ExperimentHub option and resubmit.

Cheers,
Tim

On Apr 8, 2024, at 8:34 AM, Kern, Lori <lori.sheph...@roswellpark.org> wrote:

Yes we would recommend using ExperimentHub.  Which is a database with pointers 
to the data files; so files are only downloaded when necessary to keep the 
package lightweight for end users.

You have some options to where the data is stored.  We encourage the use of 
zenodo or other well trusted data storage sites,  but a Bioconductor provided 
Microsoft data lake is also an option.

More documentation can be found at
https://bioconductor.org/packages/release/bioc/vignettes/HubPub/inst/doc/CreateAHubPackage.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_release_bioc_vignettes_HubPub_inst_doc_CreateAHubPackage.html&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=ZfeThonmZcfB0z8lAOOJJVUNxR6sRtgQi-4FrL9SDtE&e=>

If you already have a data package, really the only changes would be to remove 
the data from that package and use a trusted remote location.  Create a 
required inst/extdata/metadata.csv that has the information to add to the 
experimenthub database.  And add the required biocViews to the description.

Cheers,


Lori Shepherd - Kern
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
________________________________
From: Bioc-devel 
<bioc-devel-boun...@r-project.org<mailto:bioc-devel-boun...@r-project.org>> on 
behalf of Barry, Timothy P 
<tba...@hsph.harvard.edu<mailto:tba...@hsph.harvard.edu>>
Sent: Friday, April 5, 2024 4:22 PM
To: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org> 
<bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>>
Subject: [Bioc-devel] Moderately large files in an Experiment Data package?

Hello all,

I have initiated the submission of three packages to Bioconductor: 
sceptre<https://secure-web.cisco.com/1uRQxtsX_YKBN5zs-2CIVbqDfTt0sCf1iWI_8lIhOxVZTyoW5k9YxW1Kf3TSYW8dMCK81GaSgfUdFn8-pe1hFm52ij-4-5IL4KRzIRs7ppGV0UaIM3lHOwqVLnGwlwC-vEcDpec3LaTIh8wQ8zol8P7F5bNGhSjQfqBvOnckGY1H2yNJjn6DM_066B7XshBVkhTVO_dRz88WMhQVIIpJzAse8cPg65cPriNMYhULhbP_zoZxyLMjGP3XI7MoJSd4p54jV6JmaE73N5AjpmlbGQ36QBGnNDEgUowTkqCcggbHTjLHoxu2fdLhUmf-cBJz9/https%3A%2F%2Fkatsevich-lab.github.io%2Fsceptre%2F<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_1uRQxtsX-5FYKBN5zs-2D2CIVbqDfTt0sCf1iWI-5F8lIhOxVZTyoW5k9YxW1Kf3TSYW8dMCK81GaSgfUdFn8-2Dpe1hFm52ij-2D4-2D5IL4KRzIRs7ppGV0UaIM3lHOwqVLnGwlwC-2DvEcDpec3LaTIh8wQ8zol8P7F5bNGhSjQfqBvOnckGY1H2yNJjn6DM-5F066B7XshBVkhTVO-5FdRz88WMhQVIIpJzAse8cPg65cPriNMYhULhbP-5FzoZxyLMjGP3XI7MoJSd4p54jV6JmaE73N5AjpmlbGQ36QBGnNDEgUowTkqCcggbHTjLHoxu2fdLhUmf-2DcBJz9_https-253A-252F-252Fkatsevich-2Dlab.github.io-252Fsceptre-252F&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=U-PvFyjlTp0nQ7pses0Gkjtuv0TDnAVNHP4iVh5rJfE&e=>>
 (an R package for perturb-seq analysis), 
ondisc<https://secure-web.cisco.com/1wQ89J_Jfsnn86uWBElDqXBGijAMHN62dtaubYHt5d049pzBT9_momDshB8co3nvf9_aYLHnJfGhtITRUBnNuZ40TB73qvOJ8F9QD3i0_hhj7iYmdWWNkayrSg76fUBbWwV699LyW1khRIHFcASQzm6Oe3kb3BDLnNAGlKjxIxr5iBonviyudeiZWUjSkNku7AODWpaPDVvRXZlB6uCohX6Z85JzfJP9mH5zHZlAxK-i7b6d0l0KJa7f3I9paH7Uqr-Ls7zVZTHAywo_FnA3r13iuLOBZ3j3vA-e79d_G-PEF822nU-wr4xQ1NPCaTLlv/https%3A%2F%2Ftimothy-barry.github.io%2Fondisc%2F<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_1wQ89J-5FJfsnn86uWBElDqXBGijAMHN62dtaubYHt5d049pzBT9-5FmomDshB8co3nvf9-5FaYLHnJfGhtITRUBnNuZ40TB73qvOJ8F9QD3i0-5Fhhj7iYmdWWNkayrSg76fUBbWwV699LyW1khRIHFcASQzm6Oe3kb3BDLnNAGlKjxIxr5iBonviyudeiZWUjSkNku7AODWpaPDVvRXZlB6uCohX6Z85JzfJP9mH5zHZlAxK-2Di7b6d0l0KJa7f3I9paH7Uqr-2DLs7zVZTHAywo-5FFnA3r13iuLOBZ3j3vA-2De79d-5FG-2DPEF822nU-2Dwr4xQ1NPCaTLlv_https-253A-252F-252Ftimothy-2Dbarry.github.io-252Fondisc-252F&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=O-eOt0iS3ncDCRQUyPbRJ7sEAMrCJHh8VgIiEMzpR4s&e=>>
 (a companion R package to sceptre that implements new data structures for 
large-scale single-cell data), and 
sceptredata<https://secure-web.cisco.com/1HB9kABAwEnGmw-sgorYzuAo_navwpQHevV-fRN8iFCqtAsjH0xCzcu9VmX_9A0ZCsUO4QyvyMdA-OsTAONpfJ960ihtD0fpo0pY-udGSzT5O9HNzaCsCnobIx2kSlZgEjXV2kkCo-ARzyD10z74E2Njy0Po33tW696-D6D0NTONGdd0lEQXIyBNbshr0kU27hDIuBAuaGVgFg7C0iaZDflKwYN3kgYHcYwUrCjxK9TsHyZQ_IVeVmYRagqvNubiHPLaR7FKgvVLfXFzlW3fhtzWN_9bv9QtpeVQa9qCXpRHWRFYyg_2J2PlIeL6LNJIz/https%3A%2F%2Fgithub.com%2FKatsevich-Lab%2Fsceptredata<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_1HB9kABAwEnGmw-2DsgorYzuAo-5FnavwpQHevV-2DfRN8iFCqtAsjH0xCzcu9VmX-5F9A0ZCsUO4QyvyMdA-2DOsTAONpfJ960ihtD0fpo0pY-2DudGSzT5O9HNzaCsCnobIx2kSlZgEjXV2kkCo-2DARzyD10z74E2Njy0Po33tW696-2DD6D0NTONGdd0lEQXIyBNbshr0kU27hDIuBAuaGVgFg7C0iaZDflKwYN3kgYHcYwUrCjxK9TsHyZQ-5FIVeVmYRagqvNubiHPLaR7FKgvVLfXFzlW3fhtzWN-5F9bv9QtpeVQa9qCXpRHWRFYyg-5F2J2PlIeL6LNJIz_https-253A-252F-252Fgithub.com-252FKatsevich-2DLab-252Fsceptredata&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=zNfm0v_sgWncPf8lERtZmNzjwjfg5vv-FxqLySJiOOI&e=>>
 (an experiment data package that provides example data for sceptre and 
ondisc). ondisc depends on sceptredata, and sceptre in turn depends on both 
ondisc and sceptredata. Our updated user 
manual<https://secure-web.cisco.com/16nv_lroIzZlpgnyWOgGvq1eqpBm2k_PuSDULf7U_Jx_vmZeAHLNlSM3eZl8jBZh91AfQmsb_m-q178ouM0xGbyeXH7gSvshnH_k4AAdVEBmcrhO_PvfEUBzm4Jp3NDzPO3h2TsF2SDLil7_lMBCZv3lqxDFDvViAXUqxoLzESMuwEzdRRNhJD6nsyCjhx1nNfsEAZV22OL2PV-3nThUm8d-ZXSoXJt94MVNqb2dePxI6Q9jNAkut-kbcJaA2kFHviUDRHyHIVFsSFhocg7EEUcqHS8V7ewhKc4q5jwbKC_ioZ2V7tcbxgX9oYpkSBJxn/https%3A%2F%2Ftimothy-barry.github.io%2Fsceptre-book%2F<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_16nv-5FlroIzZlpgnyWOgGvq1eqpBm2k-5FPuSDULf7U-5FJx-5FvmZeAHLNlSM3eZl8jBZh91AfQmsb-5Fm-2Dq178ouM0xGbyeXH7gSvshnH-5Fk4AAdVEBmcrhO-5FPvfEUBzm4Jp3NDzPO3h2TsF2SDLil7-5FlMBCZv3lqxDFDvViAXUqxoLzESMuwEzdRRNhJD6nsyCjhx1nNfsEAZV22OL2PV-2D3nThUm8d-2DZXSoXJt94MVNqb2dePxI6Q9jNAkut-2DkbcJaA2kFHviUDRHyHIVFsSFhocg7EEUcqHS8V7ewhKc4q5jwbKC-5FioZ2V7tcbxgX9oYpkSBJxn_https-253A-252F-252Ftimothy-2Dbarry.github.io-252Fsceptre-2Dbook-252F&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=pdK02fvnPhg4b5UIQ1Jm0CiikWiIv2VhCVcrilTh6GA&e=>>
 describes how all three of these packages interface with one another.

In accordance with the Bioconductor submission instructions, I submitted the 
data package (i.e., sceptredata) 
first<https://secure-web.cisco.com/1B6Brc1BDURZGWcTiXfl11N7d084v9YyoAKfoLjx1iN8h8xcExKc_AbkFPuT7-el4MQekzdLj6lrHzkwGruUSBioB-mLOzC8zhmTJE6UGIFj4iaO3ieI_YlXOFE3EONre-abJa81Um_nBH25_dxjpdofbh1YNxOg1T8cJOTzyBLC15FXDm4C-Zdy_3zEKcKFELU6iwgMxCCuUJT9KcjGm4FGF98a8617yuYwCB8s0d91cLZ9SfXiok6-wW9YFPKA8X-ZDy5gKPZRa4h88frnz-OJ8eifcyPODhPD0cp1ljrKit65Ua_o60-cs3S0pFJrZ/https%3A%2F%2Fgithub.com%2FBioconductor%2FContributions%2Fissues%2F3386<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_1B6Brc1BDURZGWcTiXfl11N7d084v9YyoAKfoLjx1iN8h8xcExKc-5FAbkFPuT7-2Del4MQekzdLj6lrHzkwGruUSBioB-2DmLOzC8zhmTJE6UGIFj4iaO3ieI-5FYlXOFE3EONre-2DabJa81Um-5FnBH25-5Fdxjpdofbh1YNxOg1T8cJOTzyBLC15FXDm4C-2DZdy-5F3zEKcKFELU6iwgMxCCuUJT9KcjGm4FGF98a8617yuYwCB8s0d91cLZ9SfXiok6-2DwW9YFPKA8X-2DZDy5gKPZRa4h88frnz-2DOJ8eifcyPODhPD0cp1ljrKit65Ua-5Fo60-2Dcs3S0pFJrZ_https-253A-252F-252Fgithub.com-252FBioconductor-252FContributions-252Fissues-252F3386&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=1oSy7hGcRk-B70DaQhr4bZAB-ZpOxPpL0njoJ5J99NQ&e=>>.
 However, I received the following error message: "The package contains 
individual files over 5Mb in size. This is currently not allowed.” Indeed, 
sceptredata contains two files that are 11MB and one file that is 6MB. The 
package stores example data in both the `data` directory and the `inst/extdata` 
directory.

I thought that experiment data packages were allowed to have larger files? If 
not, does anyone have a recommendation for how I should proceed? Kasper Hansen 
suggested ExperimentHub as a solution. Might that the way to go?

Thank you greatly for the help!
Tim


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://secure-web.cisco.com/1BhRzwrpl2OU1g7a36Fr4cEEctfeVir9amqzPipnUV-nw8_GuKfeUAYMSwgmqg9worqIpRvTxMUM3DhBHSFwEGplp0LgGYlaQ8BP8or_g5cUiu0eDDnhm_ONPmX5jHf8gMtLVItRntbXJc7Vsop_ArZZKTXzuFDOzHrL_cYy9WZuiF9tnTgdYjNjyB4YNfCPGa6tKghYcatZClM57nWVn9FkHp1U0jg7bLNqUGiR2XcW59kmXmuIUiB3y-VesVK9VvGoonznj7k-tg0C0ebmLCdqn9IJ2fWnxb6_fDi5TJB0Mw4bWvEOpexLf1fz-MDwd/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel<https://urldefense.proofpoint.com/v2/url?u=https-3A__secure-2Dweb.cisco.com_1BhRzwrpl2OU1g7a36Fr4cEEctfeVir9amqzPipnUV-2Dnw8-5FGuKfeUAYMSwgmqg9worqIpRvTxMUM3DhBHSFwEGplp0LgGYlaQ8BP8or-5Fg5cUiu0eDDnhm-5FONPmX5jHf8gMtLVItRntbXJc7Vsop-5FArZZKTXzuFDOzHrL-5FcYy9WZuiF9tnTgdYjNjyB4YNfCPGa6tKghYcatZClM57nWVn9FkHp1U0jg7bLNqUGiR2XcW59kmXmuIUiB3y-2DVesVK9VvGoonznj7k-2Dtg0C0ebmLCdqn9IJ2fWnxb6-5FfDi5TJB0Mw4bWvEOpexLf1fz-2DMDwd_https-253A-252F-252Fstat.ethz.ch-252Fmailman-252Flistinfo-252Fbioc-2Ddevel&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=d1v_WzzVGvOcjsx-QSqHbX6hdozHewJMVh7ESOXo9zU&m=Ci_SgT-JTg2C2wckg7LPwij4JNnQBEgcIjPz1hcOGgwqIFGafqTkBZz8FG9qmeM9&s=Of-rS4X7groRVzPnUa2a2Q_-oRxtogULv6ldp-4JDWs&e=>

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to