On 12.06.2018 17:24, Jens Axboe wrote:
> On 6/12/18 3:17 AM, Stefan Agner wrote:
>> [also added Jens Axboe]
>>
>> On 12.06.2018 10:27, Boris Brezillon wrote:
>>> On Tue, 12 Jun 2018 10:06:42 +0200
>>> Stefan Agner <ste...@agner.ch> wrote:
>>>
>>>> On 12.06.2018 02:03, Dmitry Osipenko wrote:
>>>>> On Monday, 11 June 2018 23:52:22 MSK Stefan Agner wrote:
>>>>>> Add support for the NAND flash controller found on NVIDIA
>>>>>> Tegra 2 SoCs. This implementation does not make use of the
>>>>>> command queue feature. Regular operations/data transfers are
>>>>>> done in PIO mode. Page read/writes with hardware ECC make
>>>>>> use of the DMA for data transfer.
>>>>>>
>>>>>> Signed-off-by: Lucas Stach <d...@lynxeye.de>
>>>>>> Signed-off-by: Stefan Agner <ste...@agner.ch>
>>>>>> ---
>>>>>>  MAINTAINERS                       |    7 +
>>>>>>  drivers/mtd/nand/raw/Kconfig      |    6 +
>>>>>>  drivers/mtd/nand/raw/Makefile     |    1 +
>>>>>>  drivers/mtd/nand/raw/tegra_nand.c | 1248 +++++++++++++++++++++++++++++
>>>>>>  4 files changed, 1262 insertions(+)
>>>>>>  create mode 100644 drivers/mtd/nand/raw/tegra_nand.c
>>>>>>
>>>> [snip]
>>>>>> +static int tegra_nand_cmd(struct nand_chip *chip,
>>>>>> +                         const struct nand_subop *subop)
>>>>>> +{
>>>>>> +        const struct nand_op_instr *instr;
>>>>>> +        const struct nand_op_instr *instr_data_in = NULL;
>>>>>> +        struct tegra_nand_controller *ctrl = 
>>>>>> to_tegra_ctrl(chip->controller);
>>>>>> +        unsigned int op_id, size = 0, offset = 0;
>>>>>> +        bool first_cmd = true;
>>>>>> +        u32 reg, cmd = 0;
>>>>>> +        int ret;
>>>>>> +
>>>>>> +        for (op_id = 0; op_id < subop->ninstrs; op_id++) {
>>>>>> +                unsigned int naddrs, i;
>>>>>> +                const u8 *addrs;
>>>>>> +                u32 addr1 = 0, addr2 = 0;
>>>>>> +
>>>>>> +                instr = &subop->instrs[op_id];
>>>>>> +
>>>>>> +                switch (instr->type) {
>>>>>> +                case NAND_OP_CMD_INSTR:
>>>>>> +                        if (first_cmd) {
>>>>>> +                                cmd |= COMMAND_CLE;
>>>>>> +                                writel_relaxed(instr->ctx.cmd.opcode,
>>>>>> +                                               ctrl->regs + CMD_REG1);
>>>>>> +                        } else {
>>>>>> +                                cmd |= COMMAND_SEC_CMD;
>>>>>> +                                writel_relaxed(instr->ctx.cmd.opcode,
>>>>>> +                                               ctrl->regs + CMD_REG2);
>>>>>> +                        }
>>>>>> +                        first_cmd = false;
>>>>>> +                        break;
>>>>>> +                case NAND_OP_ADDR_INSTR:
>>>>>> +                        offset = nand_subop_get_addr_start_off(subop, 
>>>>>> op_id);
>>>>>> +                        naddrs = nand_subop_get_num_addr_cyc(subop, 
>>>>>> op_id);
>>>>>> +                        addrs = &instr->ctx.addr.addrs[offset];
>>>>>> +
>>>>>> +                        cmd |= COMMAND_ALE | COMMAND_ALE_SIZE(naddrs);
>>>>>> +                        for (i = 0; i < min_t(unsigned int, 4, naddrs); 
>>>>>> i++)
>>>>>> +                                addr1 |= *addrs++ << (BITS_PER_BYTE * 
>>>>>> i);
>>>>>> +                        naddrs -= i;
>>>>>> +                        for (i = 0; i < min_t(unsigned int, 4, naddrs); 
>>>>>> i++)
>>>>>> +                                addr2 |= *addrs++ << (BITS_PER_BYTE * 
>>>>>> i);
>>>>>> +                        writel_relaxed(addr1, ctrl->regs + ADDR_REG1);
>>>>>> +                        writel_relaxed(addr2, ctrl->regs + ADDR_REG2);
>>>>>> +                        break;
>>>>>> +
>>>>>> +                case NAND_OP_DATA_IN_INSTR:
>>>>>> +                        size = nand_subop_get_data_len(subop, op_id);
>>>>>> +                        offset = nand_subop_get_data_start_off(subop, 
>>>>>> op_id);
>>>>>> +
>>>>>> +                        cmd |= COMMAND_TRANS_SIZE(size) | COMMAND_PIO |
>>>>>> +                                COMMAND_RX | COMMAND_A_VALID;
>>>>>> +
>>>>>> +                        instr_data_in = instr;
>>>>>> +                        break;
>>>>>> +
>>>>>> +                case NAND_OP_DATA_OUT_INSTR:
>>>>>> +                        size = nand_subop_get_data_len(subop, op_id);
>>>>>> +                        offset = nand_subop_get_data_start_off(subop, 
>>>>>> op_id);
>>>>>> +
>>>>>> +                        cmd |= COMMAND_TRANS_SIZE(size) | COMMAND_PIO |
>>>>>> +                                COMMAND_TX | COMMAND_A_VALID;
>>>>>> +
>>>>>> +                        memcpy(&reg, instr->ctx.data.buf.out + offset, 
>>>>>> size);
>>>>>> +                        writel_relaxed(reg, ctrl->regs + RESP);
>>>>>> +
>>>>>> +                        break;
>>>>>> +                case NAND_OP_WAITRDY_INSTR:
>>>>>> +                        cmd |= COMMAND_RBSY_CHK;
>>>>>> +                        break;
>>>>>> +
>>>>>> +                }
>>>>>> +        }
>>>>>> +
>>>>>> +        cmd |= COMMAND_GO | COMMAND_CE(ctrl->cur_cs);
>>>>>> +        writel_relaxed(cmd, ctrl->regs + COMMAND);
>>>>>> +        ret = wait_for_completion_io_timeout(&ctrl->command_complete,
>>>>>> +                                             msecs_to_jiffies(500));
>>>>>
>>>>> It's not obvious to me whether _io_ variant is appropriate to use here, 
>>>>> would
>>>>> be nice if somebody could clarify that. Maybe block/ already does the IO
>>>>> accounting itself and hence the IO time would be counted twice in that 
>>>>> case.
>>>>
>>>> Good that you bring this up.
>>>>
>>>> I don't think that there is any higher layer which could take care of
>>>> accounting. Usually, with raw nand there is no block layer involved
>>>> anyway.
>>>>
>>>> In a quick test it seems that only when using wait_for_completion_io I/O
>>>> is properly accounted in the "wait" section of top.
>>>>
>>>> So far only a single driver (omap2) used the _io variant, but I think it
>>>> is the right thing to do! After all, it is I/O...
>>>>
>>>> Boris or any other MTD maintainer, any comment on this?
>>>
>>> Given this definition of io_schedule_timeout() [1] (which is used when
>>> you call wait_for_completion_io_timeout()), I'd say it's not useful to
>>> use the _io_ version, simply because MTD devs are not exposed as blk
>>> devices, and thus don't need the blk_schedule_flush_plug() that is done
>>> is io_schedule_prepare(). But that also means MTD I/Os are not
>>> accounted as I/Os :-(.
>>
>> Documentation of wait_for_completion_io says:
>> "The caller is accounted as waiting for IO (which traditionally means
>> blkio only)."
>>
>> Which sounds as if it using _io is only an accounting thing...
> 
> Yes, you should only use it for waiting for IO off a system call
> read path. So block IO, or file system IO. Don't use it for internal
> IO that isn't related to that.

I guess that would be the case here, since MTD page read/writes are
typically file system IOs (e.g. UBIFS).

The problem is just that is not block related at all since it uses the
MTD subsystem... And it seems that the _io variants besides accounting,
also take a role in the block subsystems device plugging mechanism. What
is unclear to me if using the _io variant from the MTD subsystem
potentially disturbs the plugging mechanism...

--
Stefan

Reply via email to